Recommended tools for managing, downloading, and storing models on lan

John6666 · October 13, 2025, 6:52am

Some options…

Use one of three patterns: 1) a shared Hugging Face cache on NFS/SMB, 2) a real proxy cache (Artifactory or Nexus), 3) a lightweight self-host mirror. Add resumable downloaders (hf_transfer, hf_xet, or aria2c). Keep Hub layout to stay compatible with all clients.

1) Shared HF cache over LAN (fast, simple)

Put the cache on your NAS and point every host to it. The Hub cache is versioned and supports symlinks; Windows falls back without symlinks and uses more disk. Use offline mode after prefetching. (Hugging Face)
Preload models by commit with snapshot_download and expose “pretty” trees via symlinks to avoid duplicates. (Hugging Face)


# docs: https://huggingface.co/docs/huggingface_hub/en/guides/cli

pip install -U "huggingface_hub[cli,hf_transfer]" # https://github.com/huggingface/hf_transfer

export HF_HUB_ENABLE_HF_TRANSFER=1

export HF_HOME=/srv/hf # shared mount (NFS/SMB)

# optional: use local_dir symlinks to a clean folder tree

python - <<'PY'

# docs: https://huggingface.co/docs/huggingface_hub/en/package_reference/file_download

from huggingface_hub import snapshot_download

snapshot_download(

"meta-llama/Llama-3.1-70B",

revision="<commit-sha>", # pin for reproducibility

local_dir="/models/llama-3.1-70b", # human-friendly path

local_dir_use_symlinks=True,

)

PY

# after warming the cache:

export HF_HUB_OFFLINE=1 # serve from LAN only

2) Proxy cache (best UX for teams)

JFrog Artifactory: native “Hugging Face” repo type. Create a remote HF repo and point clients at it with HF_ENDPOINT. WAN downloads happen once; LAN serves everything else. Docs updated 2025. (JFrog)
Sonatype Nexus: “huggingface (proxy)” supported since Feb 2025; same HF_ENDPOINT client knob. Known open issue with some Xet-backed large files as of May 22 2025. Test your models. (help.sonatype.com)


# Point HF clients at your proxy

# Artifactory docs: https://jfrog.com/help/r/jfrog-artifactory-documentation/hugging-face-repositories

# Nexus docs: https://help.sonatype.com/en/hugging-face-repositories.html

export HF_ENDPOINT="https://repo.example.com/api/huggingface/hub"

hf snapshot-download mistralai/Mixtral-8x7B-Instruct --revision <sha>

3) Lightweight self-host mirrors (DIY)

Olah: on-demand HF mirror with block-level caching. Point clients via HF_ENDPOINT. Good for labs without a full artifact manager. (GitHub)
Other community mirrors exist (hf-mirror, light-hf-proxy); evaluate support and maintenance risk. (GitHub)

Make big downloads reliable

Prefer sharded weights. HF has a hard 50 GB per file limit; large repos shard by design to avoid restart-from-zero failures. (Hugging Face)
Use hf_transfer for high-throughput, resumable transfers; enabled via env var. (GitHub)
Xet-backed repos: install hf_xet (bundled since huggingface_hub 0.32.0). You get chunk-level dedupe and resilient partial reuse; tune or disable the local Xet chunk cache with env vars. (Hugging Face)
For one huge file, use aria2c segmented downloads with resume. (Gist)


# Xet knobs (docs: https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables)

export HF_XET_CHUNK_CACHE_SIZE_BYTES=20000000000 # 20GB, or 0 to disable

# aria2 manual: https://aria2.github.io/manual/en/html/aria2c.html

aria2c -c -x12 -s12 --min-split-size=10M "https://huggingface.co/.../model-00001-of-00012.safetensors"

Storage layout and serving

Keep the Hub cache shape. Set HF_HOME or HUGGINGFACE_HUB_CACHE; use local_dir_use_symlinks=True when you need a clean model folder without duplicating blobs. (Hugging Face)
Share that cache read-only over NFS/SMB. This is a common HPC pattern; many users symlink their ~/.cache/huggingface/hub to a shared mount. (Hugging Face Forums)
Model servers: vLLM defaults to the HF cache; you can set a download directory or env var to point it at your shared path. (docs.vllm.ai)

When Git LFS or plain copies are a bad fit

Cloning giant repos via Git LFS is slower and adds Git metadata; HF recommends programmatic downloads (hf_hub_download / snapshot_download) and cache reuse instead. (Hugging Face)

Known caveats (2025-10-13)

Nexus’s HF proxy may fail on some Xet-backed large files. Validate with your exact models; Artifactory works today. (GitHub)
Symlink behavior varies on Windows; cache still works but uses more space. (Hugging Face)

Minimal rollouts

A) Team, low ops: shared cache

Export /srv/hf over NFS.
HF_HOME=/srv/hf on all hosts.
Warm with snapshot_download --revision <sha>. (Hugging Face)

B) Team, governance: proxy cache

Create Artifactory HF remote repo.
HF_ENDPOINT=https://artifactory/....
Pull once, serve over LAN. (JFrog)

C) DIY: Olah mirror

Deploy Olah.
HF_ENDPOINT=http://olah:8090. (GitHub)

Similar cases

Shared NFS cache across users discussed and used in practice on HF forums and Stack Overflow. (Hugging Face Forums)
HPC guidance advises relocating HF caches to shared storage. (docs.alliancecan.ca)
Enterprise teams proxy HF via Artifactory/Nexus to centralize model pulls. (JFrog)

Curated references

Official HF docs

Cache internals, env vars, downloading, symlink notes, offline mode. (Hugging Face)
Storage limits (50 GB/file). (Hugging Face)

Proxy repos

Artifactory HF repositories and setup. (JFrog)
Nexus HF proxy and 2025 release notes. (help.sonatype.com)

Mirrors and tools

Olah mirror, hf-mirror, light-hf-proxy. (GitHub)
hf_transfer and aria2. (GitHub)

Xet backend

Using hf_xet, env tuning, and xet-core overview. (Hugging Face)

Topic		Replies	Views
Sharing downloaded models between users 🤗Hub	3	579	December 16, 2025
Manual model download, and then move to HF cache Beginners	3	1820	September 15, 2024
Files required for offline running Beginners	0	1223	December 13, 2022
When I download model files they fail to complete and are slow to download at first Beginners	3	538	October 14, 2025
Are huggingface models downloaded on my machine? Beginners	1	8641	May 15, 2023