Error: Git LFS Pull Downloads Files Much Larger Than Model Weights and Does Not Stop

Hello,

I am experiencing an issue with Git LFS when trying to download model weights from my repository. After cloning the repository with the following commands:

GIT_LFS_SKIP_SMUDGE=1 git clone https://<your-repository-url>
cd <repository-directory>
git lfs pull

The process begins, but the download size for the files is much larger than the model weights themselves. Additionally, the download process does not seem to stop, even after a long time.

I am unsure why Git LFS is downloading so much more data than expected. Could someone please help explain why this is happening or suggest any possible solutions to resolve this issue?

Thanks in advance!

1 Like

I found similar case.


This happens because Git LFS is retrieving more objects than the one model snapshot you expected and may be retrying transfers. You can constrain what it fetches and stop the retry behavior.

Background in one minute

  • Git stores tiny pointer files for LFS-tracked paths. The real file bytes live on an LFS server. git lfs pull = git lfs fetch + git lfs checkout: it downloads the LFS objects then fills your working tree. (Stack Overflow)
  • LFS can fetch extra objects beyond the current files if you 1) don’t filter paths, 2) have “recent” fetch knobs enabled, or 3) fetched many refs/history. Those extra objects sit in .git/lfs, so the network transfer can exceed “the model size.” (Debian Manpages)
  • Apparent “never stops” often means repeated retries or transport errors. LFS retries batch requests and you only see vague progress unless you enable tracing. (GitHub)

Why your download is larger than the weights

  1. No path filters.
    By default, git lfs pull downloads all LFS objects referenced by the checkout. If the repo tracks multiple checkpoints or shards, you pull them all. Use --include/--exclude or config equivalents to limit which paths get real bytes. (Debian Manpages)

  2. “Recent” fetch settings.
    If any lfs.fetchrecent* options are enabled, LFS also fetches objects from recent commits/refs, not only the current tree. That inflates total bytes. Check and disable those knobs. (manpages.ubuntu.com)

  3. Too many refs/history.
    Cloning all branches or deep history increases the set of objects LFS considers. Use shallow clone and single branch, and optionally sparse-checkout to shrink the working tree. (git-scm.com)

  4. Retries/transport quirks.
    Network or auth issues trigger multiple batch retries, which looks like an endless download. On Hugging Face, users sometimes avoid SSH JSON-parsing errors by switching to HTTPS. Reduce concurrency for flaky links and trace to confirm. (GitHub)

  5. Local cache accumulation.
    Even after success, objects remain in .git/lfs until pruned. That’s normal but surprises disk usage. (manpages.ubuntu.com)

Fix now: smallest possible download

# 0) Fresh minimal clone: one branch, shallow, no auto LFS download
GIT_LFS_SKIP_SMUDGE=1 git clone --depth 1 --single-branch <URL>
cd <repo>

# 1) Pull ONLY the weights you need (adjust globs)
# Docs: https://manpages.debian.org/testing/git-lfs/git-lfs-pull.1.en.html
git lfs pull --include="models/**/*.safetensors" --exclude="*"

# 2) If you still see excess bytes, hard-disable “recent” fetch for this pull
git -c lfs.fetchrecent=false lfs pull

# 3) If the transfer seems stuck, trace and reduce concurrency
GIT_TRACE=1 GIT_CURL_VERBOSE=1 git -c lfs.concurrenttransfers=1 lfs pull
# Docs: https://manpages.ubuntu.com/manpages/focal//man5/git-lfs-config.5.html

# 4) After success, reclaim disk space
git lfs prune
# Docs: https://manpages.ubuntu.com/manpages/focal/man1/git-lfs-prune.1.html

--include/--exclude restricts which LFS paths get downloaded. Disabling “recent” avoids extra historical objects. Tracing exposes retries and HTTP/SSH issues. Prune drops cached “not recent” objects. (Debian Manpages)

Optional: reduce what Git itself brings down

If the repo is huge outside of LFS, combine partial clone and sparse-checkout so only a tiny tree is present before you run git lfs pull:

git clone --depth 1 --filter=blob:none --single-branch <URL>
cd <repo>
git sparse-checkout set models/ checkpoints/
# now pull LFS only for those paths
git lfs pull --include="models/**,checkpoints/**" --exclude="*"

Partial clone avoids non-LFS blobs; sparse-checkout limits the working tree so LFS won’t need objects for paths you didn’t check out. (git-scm.com)

Verify the root cause

  • See exactly which LFS objects your checkout wants:
    git lfs ls-files -l (compare to what was transferred). (GitHub)
  • Inspect effective LFS knobs:
    git lfs env and git config --get-regexp '^lfs\.' to spot lfs.fetchrecent*, lfs.concurrenttransfers, include/exclude. (manpages.ubuntu.com)
  • Inspect transport problems:
    GIT_TRACE=1 GIT_CURL_VERBOSE=1 and look for repeated “batch” attempts or SSH JSON errors; switch to HTTPS if you see the Hugging Face SSH failure pattern. (GitHub)

Clear explanations of the moving parts

  • git lfs pull vs fetch vs checkout: Pull = “fetch LFS objects for the current ref” + “write them into the working copy.” Fetch alone just populates .git/lfs; checkout populates files from what you already have. This matters because fetching too broadly increases bytes even if your working tree stays the same. (Stack Overflow)
  • “Recent” settings: lfs.fetchrecentcommitsdays, lfs.fetchrecentrefsdays, lfs.fetchrecentremoterefs, and lfs.fetchrecentalways widen what gets fetched. Defaults do not prefetch recent history, but if any are set, expect extra downloads. (manpages.ubuntu.com)
  • Concurrency and retries: lfs.concurrenttransfers controls parallel transfers. Errors trigger retries and can look like hangs without tracing. (manpages.ubuntu.com)

Curated references

Core behavior and knobs

  • git-lfs-pull(1) include/exclude and path filtering. Good for exact globs. (Debian Manpages)
  • git-lfs-config(5) “recent” and transfer settings, including lfs.fetchrecent*, lfs.concurrenttransfers. (manpages.ubuntu.com)
  • git-lfs-prune(1) what gets deleted and “recent” definition. (manpages.ubuntu.com)

Selective download patterns

  • Atlassian tutorial explaining include/exclude patterns. (Atlancian)
  • Stack Overflow: pulling only some LFS paths with lfs.fetchinclude/fetchexclude. (Stack Overflow)

Large repo hygiene

  • Git docs for partial clone and sparse-checkout to reduce Git data before touching LFS. (git-scm.com)

Stalls and transport

  • GitHub issue: hangs and retry behavior; use tracing to see timeouts and retries. (GitHub)
  • Hugging Face forum: SSH JSON errors during git lfs fetch; use HTTPS as a workaround. (Hugging Face Forums)