Spaces:

Etadingrui
/

PIWM

Sleeping

App Files Files Community

PIWM / HF_SPACES_GPU_FIX.md

musictimer

Fix initial bugs

02c6351 3 months ago

preview code

raw

history blame

4.34 kB

	# 🚀 HF Spaces GPU Acceleration Fix

	## ❌ Problem Identified:
	Your T4 GPU wasn't being used because:

	1. Dockerfile disabled CUDA: `ENV CUDA_VISIBLE_DEVICES=""`
	2. Environment variable issues: OMP_NUM_THREADS causing warnings
	3. App running on CPU: Despite having T4 GPU hardware

	## ✅ Complete Fix Applied:

	### 1. Dockerfile Changes
	```dockerfile
	# REMOVED this line that was disabling GPU:
	# ENV CUDA_VISIBLE_DEVICES=""

	# Fixed environment variables:
	ENV OMP_NUM_THREADS=2
	ENV MKL_NUM_THREADS=2
	```

	### 2. App.py Improvements
	- ✅ Fixed OMP_NUM_THREADS early: Set before any imports
	- ✅ Improved GPU detection: Better logging and detection
	- ✅ Cache directories: Moved setup to very beginning

	### 3. Environment Variable Priority
	Environment variables are now set in this order:
	1. Dockerfile - Base container settings
	2. app.py top - Python-level fixes (before imports)
	3. HF Spaces - Runtime overrides

	## 🎯 Expected Results After Fix:

	### Before (CPU mode):
	```
	INFO:app:Using device: cpu
	INFO:app:CUDA not available, using CPU - this is normal for HF Spaces free tier
	CPU 56%
	GPU 0%
	GPU VRAM 0/16 GB
	```

	### After (GPU mode):
	```
	INFO:app:Using device: cuda
	INFO:app:CUDA available: True
	INFO:app:GPU device count: 1
	INFO:app:Current GPU: Tesla T4
	INFO:app:GPU memory: 15.1 GB
	INFO:app:🚀 GPU acceleration enabled!
	```

	### Performance Improvement:
	- CPU usage: Should drop to ~20-30%
	- GPU usage: Should show 10-50% during AI inference
	- GPU VRAM: Should show 2-4GB usage
	- AI FPS: Should increase from ~2 FPS to 10+ FPS

	## 📋 Deployment Steps:

	1. Commit and push changes:
	```bash
	git add .
	git commit -m "Enable GPU acceleration for HF Spaces T4"
	git push
	```

	2. Wait for rebuild (HF Spaces will restart automatically)

	3. Check new logs for GPU detection:
	```
	INFO:app:🚀 GPU acceleration enabled!
	```

	4. Monitor system stats:
	- GPU usage should now show activity
	- GPU VRAM should show memory allocation
	- Overall performance should be much faster

	## 🔍 Debugging Commands:

	### Check CUDA in container:
	```python
	import torch
	print(f"CUDA available: {torch.cuda.is_available()}")
	print(f"GPU count: {torch.cuda.device_count()}")
	if torch.cuda.is_available():
	print(f"GPU name: {torch.cuda.get_device_name(0)}")
	```

	### Check environment variables:
	```python
	import os
	print(f"CUDA_VISIBLE_DEVICES: {os.environ.get('CUDA_VISIBLE_DEVICES', 'Not set')}")
	print(f"OMP_NUM_THREADS: {os.environ.get('OMP_NUM_THREADS')}")
	```

	## 🚨 If GPU Still Not Working:

	### 1. Verify HF Spaces Hardware:
	- Check your Space settings
	- Ensure "T4 small" or "T4 medium" is selected
	- Free tier doesn't have GPU access

	### 2. Check Container Logs:
	Look for these messages:
	- ✅ `"🚀 GPU acceleration enabled!"`
	- ❌ `"CUDA not available"`

	### 3. Alternative: Force GPU Detection
	If needed, add this debug code to app.py:
	```python
	# Debug GPU detection
	logger.info(f"Environment CUDA_VISIBLE_DEVICES: {os.environ.get('CUDA_VISIBLE_DEVICES', 'Not set')}")
	logger.info(f"PyTorch CUDA compiled: {torch.version.cuda}")
	logger.info(f"PyTorch version: {torch.__version__}")
	```

	## ⚡ Performance Optimization Tips:

	### For T4 GPU:
	1. Enable model compilation (optional):
	```bash
	# Set environment variable in HF Spaces settings:
	ENABLE_TORCH_COMPILE=1
	```

	2. Increase AI FPS (if needed):
	```python
	# In app.py, line ~86:
	self.ai_fps = 15 # Increase from 10 to 15
	```

	3. Monitor GPU memory:
	- T4 has 16GB VRAM
	- App should use 2-4GB
	- Leave headroom for other processes

	## 🎮 Expected User Experience:

	1. Faster loading: Models load to GPU memory
	2. Responsive gameplay: AI inference runs at 10+ FPS
	3. Smoother visuals: Display updates without lag
	4. Better AI performance: GPU acceleration improves model inference

	Your HF Spaces deployment should now fully utilize the T4 GPU! 🚀

	## 📊 Monitor These Metrics:
	- GPU Utilization: 10-50% during gameplay
	- GPU Memory: 2-4GB allocated
	- AI FPS: 10-15 FPS (displayed in web interface)
	- CPU Usage: Should decrease to 20-30%

	The game should feel much more responsive now! 🎉