Some options…
Use one of three patterns: 1) a shared Hugging Face cache on NFS/SMB, 2) a real proxy cache (Artifactory or Nexus), 3) a lightweight self-host mirror. Add resumable downloaders (hf_transfer, hf_xet, or aria2c). Keep Hub layout to stay compatible with all clients.
1) Shared HF cache over LAN (fast, simple)
-
Put the cache on your NAS and point every host to it. The Hub cache is versioned and supports symlinks; Windows falls back without symlinks and uses more disk. Use offline mode after prefetching. (Hugging Face)
-
Preload models by commit with
snapshot_downloadand expose “pretty” trees via symlinks to avoid duplicates. (Hugging Face)
# docs: https://huggingface.co/docs/huggingface_hub/en/guides/cli
pip install -U "huggingface_hub[cli,hf_transfer]" # https://github.com/huggingface/hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
export HF_HOME=/srv/hf # shared mount (NFS/SMB)
# optional: use local_dir symlinks to a clean folder tree
python - <<'PY'
# docs: https://huggingface.co/docs/huggingface_hub/en/package_reference/file_download
from huggingface_hub import snapshot_download
snapshot_download(
"meta-llama/Llama-3.1-70B",
revision="<commit-sha>", # pin for reproducibility
local_dir="/models/llama-3.1-70b", # human-friendly path
local_dir_use_symlinks=True,
)
PY
# after warming the cache:
export HF_HUB_OFFLINE=1 # serve from LAN only
2) Proxy cache (best UX for teams)
-
JFrog Artifactory: native “Hugging Face” repo type. Create a remote HF repo and point clients at it with
HF_ENDPOINT. WAN downloads happen once; LAN serves everything else. Docs updated 2025. (JFrog) -
Sonatype Nexus: “huggingface (proxy)” supported since Feb 2025; same
HF_ENDPOINTclient knob. Known open issue with some Xet-backed large files as of May 22 2025. Test your models. (help.sonatype.com)
# Point HF clients at your proxy
# Artifactory docs: https://jfrog.com/help/r/jfrog-artifactory-documentation/hugging-face-repositories
# Nexus docs: https://help.sonatype.com/en/hugging-face-repositories.html
export HF_ENDPOINT="https://repo.example.com/api/huggingface/hub"
hf snapshot-download mistralai/Mixtral-8x7B-Instruct --revision <sha>
3) Lightweight self-host mirrors (DIY)
-
Olah: on-demand HF mirror with block-level caching. Point clients via
HF_ENDPOINT. Good for labs without a full artifact manager. (GitHub) -
Other community mirrors exist (
hf-mirror,light-hf-proxy); evaluate support and maintenance risk. (GitHub)
Make big downloads reliable
-
Prefer sharded weights. HF has a hard 50 GB per file limit; large repos shard by design to avoid restart-from-zero failures. (Hugging Face)
-
Use
hf_transferfor high-throughput, resumable transfers; enabled via env var. (GitHub) -
Xet-backed repos: install
hf_xet(bundled sincehuggingface_hub 0.32.0). You get chunk-level dedupe and resilient partial reuse; tune or disable the local Xet chunk cache with env vars. (Hugging Face) -
For one huge file, use aria2c segmented downloads with resume. (Gist)
# Xet knobs (docs: https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables)
export HF_XET_CHUNK_CACHE_SIZE_BYTES=20000000000 # 20GB, or 0 to disable
# aria2 manual: https://aria2.github.io/manual/en/html/aria2c.html
aria2c -c -x12 -s12 --min-split-size=10M "https://huggingface.co/.../model-00001-of-00012.safetensors"
Storage layout and serving
-
Keep the Hub cache shape. Set
HF_HOMEorHUGGINGFACE_HUB_CACHE; uselocal_dir_use_symlinks=Truewhen you need a clean model folder without duplicating blobs. (Hugging Face) -
Share that cache read-only over NFS/SMB. This is a common HPC pattern; many users symlink their
~/.cache/huggingface/hubto a shared mount. (Hugging Face Forums) -
Model servers: vLLM defaults to the HF cache; you can set a download directory or env var to point it at your shared path. (docs.vllm.ai)
When Git LFS or plain copies are a bad fit
- Cloning giant repos via Git LFS is slower and adds Git metadata; HF recommends programmatic downloads (
hf_hub_download/snapshot_download) and cache reuse instead. (Hugging Face)
Known caveats (2025-10-13)
-
Nexus’s HF proxy may fail on some Xet-backed large files. Validate with your exact models; Artifactory works today. (GitHub)
-
Symlink behavior varies on Windows; cache still works but uses more space. (Hugging Face)
Minimal rollouts
A) Team, low ops: shared cache
-
Export
/srv/hfover NFS. -
HF_HOME=/srv/hfon all hosts. -
Warm with
snapshot_download --revision <sha>. (Hugging Face)
B) Team, governance: proxy cache
-
Create Artifactory HF remote repo.
-
HF_ENDPOINT=https://artifactory/.... -
Pull once, serve over LAN. (JFrog)
C) DIY: Olah mirror
-
Deploy Olah.
-
HF_ENDPOINT=http://olah:8090. (GitHub)
Similar cases
-
Shared NFS cache across users discussed and used in practice on HF forums and Stack Overflow. (Hugging Face Forums)
-
HPC guidance advises relocating HF caches to shared storage. (docs.alliancecan.ca)
-
Enterprise teams proxy HF via Artifactory/Nexus to centralize model pulls. (JFrog)
Curated references
Official HF docs
-
Cache internals, env vars, downloading, symlink notes, offline mode. (Hugging Face)
-
Storage limits (50 GB/file). (Hugging Face)
Proxy repos
-
Artifactory HF repositories and setup. (JFrog)
-
Nexus HF proxy and 2025 release notes. (help.sonatype.com)
Mirrors and tools
Xet backend
- Using
hf_xet, env tuning, and xet-core overview. (Hugging Face)