Recommended tools for managing, downloading, and storing models on lan

Some options…


Use one of three patterns: 1) a shared Hugging Face cache on NFS/SMB, 2) a real proxy cache (Artifactory or Nexus), 3) a lightweight self-host mirror. Add resumable downloaders (hf_transfer, hf_xet, or aria2c). Keep Hub layout to stay compatible with all clients.

1) Shared HF cache over LAN (fast, simple)

  • Put the cache on your NAS and point every host to it. The Hub cache is versioned and supports symlinks; Windows falls back without symlinks and uses more disk. Use offline mode after prefetching. (Hugging Face)

  • Preload models by commit with snapshot_download and expose “pretty” trees via symlinks to avoid duplicates. (Hugging Face)


# docs: https://huggingface.co/docs/huggingface_hub/en/guides/cli

pip install -U "huggingface_hub[cli,hf_transfer]" # https://github.com/huggingface/hf_transfer

export HF_HUB_ENABLE_HF_TRANSFER=1

export HF_HOME=/srv/hf # shared mount (NFS/SMB)

# optional: use local_dir symlinks to a clean folder tree

python - <<'PY'

# docs: https://huggingface.co/docs/huggingface_hub/en/package_reference/file_download

from huggingface_hub import snapshot_download

snapshot_download(

"meta-llama/Llama-3.1-70B",

revision="<commit-sha>", # pin for reproducibility

local_dir="/models/llama-3.1-70b", # human-friendly path

local_dir_use_symlinks=True,

)

PY

# after warming the cache:

export HF_HUB_OFFLINE=1 # serve from LAN only

2) Proxy cache (best UX for teams)

  • JFrog Artifactory: native “Hugging Face” repo type. Create a remote HF repo and point clients at it with HF_ENDPOINT. WAN downloads happen once; LAN serves everything else. Docs updated 2025. (JFrog)

  • Sonatype Nexus: “huggingface (proxy)” supported since Feb 2025; same HF_ENDPOINT client knob. Known open issue with some Xet-backed large files as of May 22 2025. Test your models. (help.sonatype.com)


# Point HF clients at your proxy

# Artifactory docs: https://jfrog.com/help/r/jfrog-artifactory-documentation/hugging-face-repositories

# Nexus docs: https://help.sonatype.com/en/hugging-face-repositories.html

export HF_ENDPOINT="https://repo.example.com/api/huggingface/hub"

hf snapshot-download mistralai/Mixtral-8x7B-Instruct --revision <sha>

3) Lightweight self-host mirrors (DIY)

  • Olah: on-demand HF mirror with block-level caching. Point clients via HF_ENDPOINT. Good for labs without a full artifact manager. (GitHub)

  • Other community mirrors exist (hf-mirror, light-hf-proxy); evaluate support and maintenance risk. (GitHub)

Make big downloads reliable

  • Prefer sharded weights. HF has a hard 50 GB per file limit; large repos shard by design to avoid restart-from-zero failures. (Hugging Face)

  • Use hf_transfer for high-throughput, resumable transfers; enabled via env var. (GitHub)

  • Xet-backed repos: install hf_xet (bundled since huggingface_hub 0.32.0). You get chunk-level dedupe and resilient partial reuse; tune or disable the local Xet chunk cache with env vars. (Hugging Face)

  • For one huge file, use aria2c segmented downloads with resume. (Gist)


# Xet knobs (docs: https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables)

export HF_XET_CHUNK_CACHE_SIZE_BYTES=20000000000 # 20GB, or 0 to disable

# aria2 manual: https://aria2.github.io/manual/en/html/aria2c.html

aria2c -c -x12 -s12 --min-split-size=10M "https://huggingface.co/.../model-00001-of-00012.safetensors"

Storage layout and serving

  • Keep the Hub cache shape. Set HF_HOME or HUGGINGFACE_HUB_CACHE; use local_dir_use_symlinks=True when you need a clean model folder without duplicating blobs. (Hugging Face)

  • Share that cache read-only over NFS/SMB. This is a common HPC pattern; many users symlink their ~/.cache/huggingface/hub to a shared mount. (Hugging Face Forums)

  • Model servers: vLLM defaults to the HF cache; you can set a download directory or env var to point it at your shared path. (docs.vllm.ai)

When Git LFS or plain copies are a bad fit

  • Cloning giant repos via Git LFS is slower and adds Git metadata; HF recommends programmatic downloads (hf_hub_download / snapshot_download) and cache reuse instead. (Hugging Face)

Known caveats (2025-10-13)

  • Nexus’s HF proxy may fail on some Xet-backed large files. Validate with your exact models; Artifactory works today. (GitHub)

  • Symlink behavior varies on Windows; cache still works but uses more space. (Hugging Face)


Minimal rollouts

A) Team, low ops: shared cache

  1. Export /srv/hf over NFS.

  2. HF_HOME=/srv/hf on all hosts.

  3. Warm with snapshot_download --revision <sha>. (Hugging Face)

B) Team, governance: proxy cache

  1. Create Artifactory HF remote repo.

  2. HF_ENDPOINT=https://artifactory/....

  3. Pull once, serve over LAN. (JFrog)

C) DIY: Olah mirror

  1. Deploy Olah.

  2. HF_ENDPOINT=http://olah:8090. (GitHub)


Similar cases

  • Shared NFS cache across users discussed and used in practice on HF forums and Stack Overflow. (Hugging Face Forums)

  • HPC guidance advises relocating HF caches to shared storage. (docs.alliancecan.ca)

  • Enterprise teams proxy HF via Artifactory/Nexus to centralize model pulls. (JFrog)


Curated references

Official HF docs

  • Cache internals, env vars, downloading, symlink notes, offline mode. (Hugging Face)

  • Storage limits (50 GB/file). (Hugging Face)

Proxy repos

Mirrors and tools

  • Olah mirror, hf-mirror, light-hf-proxy. (GitHub)

  • hf_transfer and aria2. (GitHub)

Xet backend

  • Using hf_xet, env tuning, and xet-core overview. (Hugging Face)