Spaces:

Etadingrui
/

PIWM

Sleeping

File size: 2,788 Bytes

02c6351

# 🔧 HF Spaces Cache Permission Fix

## ❌ **Problem:**
```
ERROR:app:Failed to load model: [Errno 13] Permission denied: '/.cache'
```

HF Spaces containers can't write to the root `/.cache` directory, causing model downloads to fail.

## ✅ **Solution Applied:**

### 1. **Fixed Cache Directory in app.py**
- ✅ Set custom cache directory: `/tmp/torch_cache`
- ✅ Added proper permissions handling
- ✅ Fixed OMP_NUM_THREADS environment variable issue

### 2. **Updated Dockerfile**
- ✅ Set environment variables to use `/tmp` for caches
- ✅ Pre-create cache directories
- ✅ Fixed OMP_NUM_THREADS value

### 3. **Key Changes Made:**

#### **app.py Changes:**
```python
# Fixed cache directory for torch.hub
state_dict = torch.hub.load_state_dict_from_url(
    model_url, 
    map_location=device,
    model_dir=cache_dir,          # Custom cache dir
    check_hash=False              # Skip hash check for speed
)

# Fixed environment variables
os.environ["OMP_NUM_THREADS"] = "2"   # Valid integer
os.environ["TORCH_HOME"] = "/tmp/torch"
os.environ["HF_HOME"] = "/tmp/huggingface"
```

#### **Dockerfile Changes:**
```dockerfile
ENV OMP_NUM_THREADS=2
ENV TORCH_HOME=/tmp/torch
ENV HF_HOME=/tmp/huggingface
ENV TRANSFORMERS_CACHE=/tmp/transformers

RUN mkdir -p /tmp/torch /tmp/huggingface /tmp/transformers
```

## 🚀 **Expected Results:**
- ✅ No more "Permission denied: /.cache" errors
- ✅ No more "Invalid value for environment variable OMP_NUM_THREADS" warnings
- ✅ Model downloads work properly on HF Spaces
- ✅ App starts correctly and clicking works

## 📋 **To Deploy:**
1. **Commit the changes**: `git add . && git commit -m "Fix HF Spaces cache permissions"`
2. **Push to HF Spaces**: `git push`
3. **Monitor logs**: Check that download succeeds without permission errors
4. **Test**: Click the game area - should work now!

## 🔍 **Log Messages to Look For:**
### ✅ **Success:**
```
INFO:app:Loading state dict from https://huggingface.co/Etadingrui/diamond-1B/resolve/main/agent_epoch_00003.pt
INFO:app:State dict loaded, applying to agent...
INFO:app:Model has actor_critic weights: False
INFO:app:Actor-critic model exists but has no trained weights - using dummy mode!
INFO:app:WebPlayEnv set to human control mode (no trained weights)
INFO:app:Models initialized successfully!
```

### ❌ **If Still Failing:**
```
ERROR:app:Failed to load model: [Errno 13] Permission denied
```

## 🎯 **What This Fixes:**
1. ✅ **Model downloading** - now uses writable `/tmp` directory
2. ✅ **Environment variables** - OMP_NUM_THREADS is valid
3. ✅ **Game clicking** - works after model loads (even without actor_critic)
4. ✅ **HF Spaces compatibility** - follows container best practices

The app should now work perfectly on HF Spaces! 🎉