PIWM / HF_SPACES_CACHE_FIX.md
musictimer's picture
Fix initial bugs
02c6351

๐Ÿ”ง HF Spaces Cache Permission Fix

โŒ Problem:

ERROR:app:Failed to load model: [Errno 13] Permission denied: '/.cache'

HF Spaces containers can't write to the root /.cache directory, causing model downloads to fail.

โœ… Solution Applied:

1. Fixed Cache Directory in app.py

  • โœ… Set custom cache directory: /tmp/torch_cache
  • โœ… Added proper permissions handling
  • โœ… Fixed OMP_NUM_THREADS environment variable issue

2. Updated Dockerfile

  • โœ… Set environment variables to use /tmp for caches
  • โœ… Pre-create cache directories
  • โœ… Fixed OMP_NUM_THREADS value

3. Key Changes Made:

app.py Changes:

# Fixed cache directory for torch.hub
state_dict = torch.hub.load_state_dict_from_url(
    model_url, 
    map_location=device,
    model_dir=cache_dir,          # Custom cache dir
    check_hash=False              # Skip hash check for speed
)

# Fixed environment variables
os.environ["OMP_NUM_THREADS"] = "2"   # Valid integer
os.environ["TORCH_HOME"] = "/tmp/torch"
os.environ["HF_HOME"] = "/tmp/huggingface"

Dockerfile Changes:

ENV OMP_NUM_THREADS=2
ENV TORCH_HOME=/tmp/torch
ENV HF_HOME=/tmp/huggingface
ENV TRANSFORMERS_CACHE=/tmp/transformers

RUN mkdir -p /tmp/torch /tmp/huggingface /tmp/transformers

๐Ÿš€ Expected Results:

  • โœ… No more "Permission denied: /.cache" errors
  • โœ… No more "Invalid value for environment variable OMP_NUM_THREADS" warnings
  • โœ… Model downloads work properly on HF Spaces
  • โœ… App starts correctly and clicking works

๐Ÿ“‹ To Deploy:

  1. Commit the changes: git add . && git commit -m "Fix HF Spaces cache permissions"
  2. Push to HF Spaces: git push
  3. Monitor logs: Check that download succeeds without permission errors
  4. Test: Click the game area - should work now!

๐Ÿ” Log Messages to Look For:

โœ… Success:

INFO:app:Loading state dict from https://huggingface.co/Etadingrui/diamond-1B/resolve/main/agent_epoch_00003.pt
INFO:app:State dict loaded, applying to agent...
INFO:app:Model has actor_critic weights: False
INFO:app:Actor-critic model exists but has no trained weights - using dummy mode!
INFO:app:WebPlayEnv set to human control mode (no trained weights)
INFO:app:Models initialized successfully!

โŒ If Still Failing:

ERROR:app:Failed to load model: [Errno 13] Permission denied

๐ŸŽฏ What This Fixes:

  1. โœ… Model downloading - now uses writable /tmp directory
  2. โœ… Environment variables - OMP_NUM_THREADS is valid
  3. โœ… Game clicking - works after model loads (even without actor_critic)
  4. โœ… HF Spaces compatibility - follows container best practices

The app should now work perfectly on HF Spaces! ๐ŸŽ‰