Local Model Backups

Hey guys, im having trouble with local backups and it’s getting slighty (a LOT) annoying to have to redownload the models everytime.

They seem to become corrupted once i move them, or back them up locally. The only fix is the redownload them, copying them from a cold backup doesnt work.

Does anyone know how to safely store models LOCALLY, what filesystem type is best ? And why they are getting corrupted for NO reasons.

Thanks :smiley:

1 Like

First, have you downloaded it correctly? The download method is quite confusing, so this part causes a lot of trouble…:sweat_smile:


The safest way to store Hugging Face models locally is to separate two things in your mind:

  1. The shared cache (managed automatically by Hugging Face libraries).
  2. Your own “model store” directories (folders you control, back up, and load from).

If you treat the cache as internal scratch space and only back up clean model folders, you avoid most corruption and confusion.

Below is a step-by-step, beginner-safe approach.


1. Background: how Hugging Face stores things on your machine

When you do something like:

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

internally the libraries:

  1. Download the files from the Hub (weights, config, tokenizer).
  2. Store them in a versioned cache directory on disk (by default something like ~/.cache/huggingface/hub). (Hugging Face)
  3. Next time, load directly from that cache instead of re-downloading. (Hugging Face)

The low-level download function hf_hub_download() explicitly says:

  • It downloads a file,
  • Caches it,
  • Returns a path into the cache that must not be modified, otherwise the cache can be corrupted. (Hugging Face)

So:

  • Cache = shared internal storage, ephemeral, auto-managed.
  • Your model folders = self-contained directories you create for long-term storage and backups.

Storing models “properly” means:

  • Let the cache do its job.
  • Create explicit local model directories for anything important.
  • Store/backup those directories, not random cache internals.

2. Decide where your local models live

Pick one root folder on a reliable disk, for example:

  • Linux/macOS: /home/you/models or /mnt/bigdisk/hf-models
  • Windows: D:\hf-models

Inside this root you will have one subfolder per model, e.g.:

  • /home/you/models/bert-base-uncased
  • /home/you/models/meta-llama-3-8b
  • D:\hf-models\sdxl-base

You will then:

  • Download or export models into those subfolders.
  • Load models from those folders.
  • Backup those folders (zip/tar/copy).

3. Method A (Python): load once, then save with save_pretrained

This is very simple and beginner-friendly.

3.1 Download and save

from transformers import AutoModel, AutoTokenizer

model_id = "bert-base-uncased"
local_dir = "/home/you/models/bert-base-uncased"  # adjust path

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id)

tokenizer.save_pretrained(local_dir)
model.save_pretrained(local_dir)

What this does:

  • from_pretrained(model_id) downloads into the cache (if not present yet). (Hugging Face)
  • save_pretrained(local_dir) writes a complete copy into your chosen folder (config + weights + tokenizer files). (Hugging Face)

Now /home/you/models/bert-base-uncased is your stored local model.

3.2 Load from that folder later (offline-friendly)

from transformers import AutoModel, AutoTokenizer

local_dir = "/home/you/models/bert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(local_dir)
model = AutoModel.from_pretrained(local_dir)

from_pretrained accepts either a model name (Hub) or a local directory path as long as that directory contains a valid config.json and the weight files. (Hugging Face)

This pattern is the official way to store models on disk and reload them.


4. Method B (Python): use snapshot_download as a “repo snapshot”

If you want a directory that mirrors the entire Hub repo (including extra files like README, merges, Lora adapters, etc.), the Hub library has snapshot_download().

4.1 Download a full repo into a folder

from huggingface_hub import snapshot_download

repo_id = "bert-base-uncased"
local_dir = "/home/you/models/bert-base-uncased"

snapshot_download(
    repo_id=repo_id,
    local_dir=local_dir,
    local_dir_use_symlinks=False,  # important for portable storage
)
  • snapshot_download pulls all files for that repo and revision, and caches them. (Hugging Face)
  • local_dir tells it to also create a full working copy in your chosen folder. (Hugging Face)
  • local_dir_use_symlinks=False forces real files instead of symlinks pointing into the cache; this is better for backups and moving to other machines. (Hugging Face)

4.2 Load from that directory

from transformers import AutoModel, AutoTokenizer

local_dir = "/home/you/models/bert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(local_dir)
model = AutoModel.from_pretrained(local_dir)

Same idea as in Method A: you always load from the directory, not from arbitrary cache paths.


5. Method C (CLI): hf download --local-dir

If you prefer the command line, the hf CLI (from huggingface_hub) can download a model directly into your model store folder.

5.1 Install the CLI

pip install -U huggingface_hub
hf --help

The docs describe hf download as the recommended way to download files from the Hub via CLI, with an optional --local-dir for a git-like “working directory” workflow. (Hugging Face)

5.2 Download an entire model repo to a local dir

hf download meta-llama/Meta-Llama-3-8B-Instruct \
  --local-dir "/home/you/models/llama3-8b"

This:

  • Downloads the whole repo.
  • Uses the cache system under the hood.
  • Creates /home/you/models/llama3-8b with the files plus a small internal .cache/huggingface subfolder for metadata. (Hugging Face)

Later in Python:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_dir = "/home/you/models/llama3-8b"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(model_dir)

Again: local directory → from_pretrained(local_dir).


6. How this relates to the cache (and when to move it)

6.1 Where the cache is by default

The Hub cache lives by default under ~/.cache/huggingface/hub (or the OS-specific equivalent). (Hugging Face)

You can change its location by:

  • Passing cache_dir= arguments to download/functions, or
  • Setting environment variables like HF_HOME, HUGGINGFACE_HUB_CACHE, or TRANSFORMERS_CACHE so the libraries use a different root. (Hugging Face)

Changing HF_HOME, for example, moves where the Hub libraries keep both the token and the cache. (Hugging Face)

6.2 Important rule: do not edit cache files

The low-level docs are explicit that hf_hub_download returns a path that is a pointer into this cache and you should not modify those files because you risk corrupting the cache. (Hugging Face)

So:

  • Use the cache as built-in storage for “whatever you happen to download”.
  • Use your model store folders for anything you care about long term.

If the cache becomes messy or too large, the CLI provides commands like hf cache info and hf cache delete/clear to clean it up (and hf cache verify to check files against checksum). (Hugging Face)


7. Backing up and verifying local model folders

Once you have a folder per model under your chosen root, you can treat them like any other assets:

  1. Backup strategy

    • Zip or tar.gz each model directory.
    • Store it on a reliable filesystem (ext4, NTFS, APFS, exFAT, etc.; avoid FAT32 for multi-GB files because of the 4 GB limit).
    • Copy to external drives, NAS, etc., as you would with large project/data folders.
  2. Verification

    • Use hf cache verify if you want to check a cache snapshot or local directory against the Hub checksums:

      hf cache verify deepseek-ai/DeepSeek-OCR
      

      or point it at a local directory. (Hugging Face)

    • Or generate your own hashes (sha256sum or similar) for archives.

  3. Restoration

    • Unzip/untar the folder into your model root.
    • Load with from_pretrained("/path/to/restored/folder").

No interaction with the cache is strictly required for restore, as long as the folder is complete.


8. Quick, concrete recipes

8.1 Minimal Python-only recipe (beginner-friendly)

  1. Choose root:

    /home/you/models
    
  2. Download + save:

    from transformers import AutoModel, AutoTokenizer
    
    model_id = "bert-base-uncased"
    local_dir = "/home/you/models/bert-base-uncased"
    
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModel.from_pretrained(model_id)
    
    tokenizer.save_pretrained(local_dir)
    model.save_pretrained(local_dir)
    
  3. Later, load from disk:

    from transformers import AutoModel, AutoTokenizer
    
    model_dir = "/home/you/models/bert-base-uncased"
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModel.from_pretrained(model_dir)
    

8.2 Minimal CLI + Python recipe

  1. Download via CLI:

    hf download sentence-transformers/all-MiniLM-L6-v2 \
      --local-dir "/home/you/models/all-MiniLM-L6-v2"
    
  2. Use in Python:

    from sentence_transformers import SentenceTransformer
    
    model_dir = "/home/you/models/all-MiniLM-L6-v2"
    model = SentenceTransformer(model_dir)
    

(For plain Transformers models you would use AutoModel / AutoTokenizer, same as above.)


Short bullet summary

  • Hugging Face models are stored in a cache by default (~/.cache/huggingface/hub), managed by Hub libraries and configurable via environment variables like HF_HOME and TRANSFORMERS_CACHE. (Hugging Face)

  • Cache paths returned by hf_hub_download are pointers to that cache and must not be edited directly. (Hugging Face)

  • To “store models locally properly”, create your own model store root directory, and for each model:

    • Use save_pretrained() after from_pretrained, or (Hugging Face)
    • Use snapshot_download(repo_id, local_dir=..., local_dir_use_symlinks=False), or (Hugging Face)
    • Use hf download <repo> --local-dir <folder> from the CLI. (Hugging Face)
  • Always load offline with from_pretrained("/path/to/folder"), not with random cache paths. (Hugging Face)

  • Back up these model folders (zip/tar, copy) just like any other large project directory, and optionally verify them using hashes or hf cache verify. (Hugging Face)

Hi,

thanks for your reply, im quite new and noob to this. I just click download and save them to my drive.

They work fine, until i back them up the start producing black outputs.

I tried NTFS file system and ExFAT and same results.

I have never used anything you mentioned as i have no idea what it is.

I just want local backup and not use your precious storage space.

Thank you :smiley:

1 Like

Just want to add,

Im talking about LoRa’s

Nothing too crazy. Most of them are available on CIVITAI but when something goes wrong and i have to reinstall them it becomes a pain, that’s why i simply want to store them locally.

I think you must be talking about something else.

thanks

1 Like

Ah, so you mean that “file system”…

FAT32 might have capacity limits, but NTFS or ExFAT are more lenient. I actually use NTFS myself and haven’t encountered any issues.
If something seems like it might go wrong, I’d guess the cause lies elsewhere besides the file system.

I don’t know, i can download the lora fine, and upload them fine to huggingface and use them fine.

But as soon as i have to reinstall comfyui because their updates always break everything, or as soon as i make a backup of the models to my external drive. They are broken and will create NaN errors.

And it’s been very consistant, tried multiple file systems and backups always breaks. That’s why im just trying to be able to store them safely somewhere and not online.

I thought i was missing something and i probably am since nothing else is being corrupted on my pc.

1 Like

Hmm… I considered the possibility of hardware damage, but if reinstalling ComfyUI would definitely break it, doesn’t that mean the ComfyUI environment is corrupted while the files themselves might still be intact?


Short answer:
Your LoRAs are almost certainly not “randomly rotting”. What is happening is usually one of:

  1. The files are getting slightly corrupted when you copy them to / from the external drive.
  2. ComfyUI changes (after reinstall/update) make the same LoRA + model combo start producing NaNs, even though the file itself is fine.
  3. Paths and folders get mixed up after reinstall so ComfyUI is not actually using the file you think it is.

You can fix this with:

  • One safe, permanent “models” folder on your PC.
  • Backups as ZIP/7z archives that you test.
  • A quick hash check (file fingerprint) to prove if a backup is really identical.
  • Treating ComfyUI as disposable, not your model files.

Below is a detailed, beginner-safe breakdown.


1. Background: what is actually stored where

1.1 Hugging Face side

Hugging Face libraries use two layers: (Hugging Face Forums)

  1. Cache

    • Hidden “scratch” area (like ~/.cache/huggingface/hub or equivalent on Windows).
    • Managed automatically.
    • hf_hub_download() explicitly says the paths it returns point into this cache and must not be edited, or the cache may break. (Hugging Face Forums)
  2. Your own model folders

    • Normal folders like D:\hf-models\sdxl-base or D:\AI\models\lora-name.
    • You choose these.
    • These are what you should back up and restore.

The Hugging Face “Local Model Backups” thread says exactly this: treat the cache as internal scratch space and only back up clean model folders you control. (Hugging Face Forums)

1.2 ComfyUI side

ComfyUI also uses a folder tree of model files: checkpoints, LoRAs, VAEs, etc. By default they sit under ComfyUI\models\.... The official docs say you can move your real models outside of ComfyUI and point ComfyUI at them via an extra_model_paths.yaml file. (ComfyUI)

Key idea:

Your LoRAs and checkpoints should live in one central folder that you own.
ComfyUI should just reference that folder.

Once you do that, reinstalling ComfyUI does not touch your model files at all.


2. Possible causes of your “corruption”

2.1 Partial or bad copies to / from the external drive

What happens

LoRA files (.safetensors) are big binary files. If:

  • The USB cable is flaky.
  • The external drive has errors.
  • You unplug the drive too early.
  • You copy while something is still downloading.

then the copy can be truncated (shorter than it should be) or slightly damaged.

The safetensors library then throws errors like:

“The safetensors file is incomplete. Check the file size and make sure you have copied/downloaded it correctly.”

This exact message appears in a ComfyUI GitHub issue where a model copied into ComfyUI/models/controlnet/... was incomplete; safetensors raised MetadataIncompleteBuffer. (GitHub)

Sometimes the file is “valid enough” to load, but the weights are garbage, and later the UNet math blows up and you see NaNs.

Why this matches your case

  • Fresh download works.
  • Copy to external and back → NaNs / black outputs.
  • Re-download fixes it.

That is exactly the pattern of a copy that is not bit-identical to the original.

How to recognise

  • Sometimes you get an explicit ComfyUI error like above instead of just NaNs. (GitHub)
  • File size of a “bad” copy is smaller than the original.
  • A checksum (hash) of the file changes after copy (more on this in solutions).

2.2 Backing up the wrong thing (cache vs real model folders)

If you ever copy:

  • Random subfolders from ~/.cache/huggingface/hub (or Windows equivalent).
  • Symlinked files instead of the real file.
  • Temporary download folders.

you can easily end up with:

  • Half a model.
  • Only one shard of a sharded model.
  • Symlinks that point to nothing on the new machine.

The Hugging Face docs and the local backups answer say:

  • Cache paths returned by hf_hub_download are pointers into the cache and must not be edited. (Hugging Face Forums)
  • For long-term storage you should create your own per-model folder using save_pretrained, snapshot_download(local_dir=..., local_dir_use_symlinks=False), or hf download --local-dir. (Hugging Face Forums)

If you back up random cache internals instead of those clean folders, you will see exactly the kind of weird behaviour you describe.

Your note “I just click download and save them to my drive” suggests you may be fine here, but it is worth knowing: do not copy HF cache directories around as “backups”.


2.3 ComfyUI reinstall changes the environment → NaNs

NaNs in the UNet are very common when:

  • The GPU is near its limits.
  • Precision is too low (lots of fp16 under tight VRAM).
  • xFormers or other optimisations change.
  • PyTorch / CUDA versions change.

The classic error text (from AUTOMATIC1111, Hugging Face and others) is: (GitHub)

“NansException: A tensor with all NaNs was produced in Unet.
This could be either because there’s not enough precision to represent the picture, or because your video card does not support half type. Try setting ‘Upcast cross attention layer to float32’ or using the --no-half command line argument…”

Many people see this after only changing:

  • WebUI / ComfyUI version.
  • Driver version.
  • Optimisation settings.

They do not change the model file at all, but NaNs appear.

Why this fits your story

You said:

“As soon as I have to reinstall ComfyUI… they are broken and will create NaN errors.”

If after reinstall:

  • The same LoRA file from before now triggers NaNs,
  • And your backup copy has the same size and hash as the original,

then nothing is “corrupted”. The math environment changed and became less stable.


2.4 Wrong model + LoRA combination after restore

If you mix:

  • LoRAs for SD1.5 with SDXL or Flux base models.
  • ControlNet models built for different base models.
  • Wrong VAEs.

you can get:

  • Shape mismatch errors, or
  • Very unstable activations that lead to NaNs.

After a reinstall, it is easy for ComfyUI to:

  • Point to a different default checkpoint.
  • Forget some custom paths.
  • Load a different VAE or text encoder.

Then a workflow that used to be “LoRA X + base model Y” becomes “LoRA X + base model Z”, which might blow up numerically even if both files are individually fine.


2.5 External drive or cable issues

If the external drive or USB path is flaky, the damage will only show up on large files. Small text files are usually fine.

This is why it can feel like “only LoRAs get corrupted”:

  • They are large.
  • They are compressed binary formats.
  • A tiny error in a large .safetensors often causes the loader to fail or the network to produce NaNs.

Tools like 7-Zip have a t (test) command specifically to check archive integrity. (7-zip.opensource.jp)

If testing a big .7z archive on your external drive ever shows errors, then the drive/cable/port is not safe for these backups.


3. Safe setup and solutions (PC-beginner friendly)

3.1 Step 1: Pick one permanent “models” folder

On Windows:

  1. Open File Explorer.

  2. Choose a drive with free space (for example D:).

  3. Create:

    D:\AI\
        models\
            checkpoints\
            loras\
            vae\
            controlnet\
    

This is your main model store. Do not put ComfyUI itself in here. This folder is what you back up and restore.

3.2 Step 2: Tell ComfyUI to use that folder

You want ComfyUI to look into D:\AI\models instead of storing everything inside its own directory.

ComfyUI docs and guides say to use extra_model_paths.yaml for this on Windows: (GitHub)

  1. Go to your ComfyUI install folder, something like:

    C:\ComfyUI_windows_portable\ComfyUI\
    
  2. Find extra_model_paths.yaml.example.

  3. Copy it and rename the copy to:

    extra_model_paths.yaml
    
  4. Open extra_model_paths.yaml in Notepad.

  5. Add something like:

    my_models:
      base_path: D:/AI/models
      checkpoints: checkpoints
      loras: loras
      vae: vae
      controlnet: controlnet
    

    Notes:

    • Use / in paths, ComfyUI accepts that even on Windows. (Reddit)
  6. Move your existing models into these folders:

    • LoRAs → D:\AI\models\loras
    • Checkpoints → D:\AI\models\checkpoints
    • etc.
  7. Start ComfyUI and press r in the UI to refresh models. (Blender Neko)

Now:

  • You can delete and reinstall ComfyUI as many times as you want.
  • Your models stay in D:\AI\models and do not move.

3.3 Step 3: Download and store LoRAs correctly

For LoRAs from Civitai or similar:

  • When you click download in the browser, save directly into D:\AI\models\loras.
  • Wait for the download to fully finish before closing the browser or moving the file.
  • Do not move the file again unless you really have to.

If you ever use Hugging Face Hub via Python or CLI, the “Local Model Backups” answer recommends: (Hugging Face Forums)

  • Use snapshot_download(..., local_dir=..., local_dir_use_symlinks=False) or
  • Use hf download <repo> --local-dir D:\AI\models\some-model

so that your real model copy is in your own folder, not just the cache.

3.4 Step 4: Back up models as archives and test them

Instead of copying hundreds of .safetensors individually:

  1. Install 7-Zip if you do not have it.

  2. Right-click D:\AI\models → 7-Zip → “Add to models_backup.7z”.

  3. After it finishes, test the archive:

    • Open 7-Zip, select models_backup.7z, click Test.

    • Or from Command Prompt:

      "C:\Program Files\7-Zip\7z.exe" t "D:\AI\models_backup.7z"
      

      The t command tests archive integrity. (7-zip.opensource.jp)

  4. If test passes, copy models_backup.7z to your external drive.

  5. On the external drive, run the test again.

If the test fails only on the external copy, the drive or cable is causing corruption.

When you need to restore:

  1. Copy models_backup.7z from external back to your PC.
  2. Test it again with 7-Zip.
  3. If it passes, extract it into D:\AI\models.

Now all the LoRAs and checkpoints are back in the exact same paths ComfyUI expects.

3.5 Step 5: Simple “file fingerprint” check (hash) on Windows

To prove a single LoRA file is identical before and after backup:

  1. Open Command Prompt.

  2. Run:

    certutil -hashfile "D:\AI\models\loras\my_lora.safetensors" SHA256
    

    This prints a SHA-256 hash (a long hex number). Windows tips and vendor docs show this exact pattern. (Qiita)

  3. Copy the same file to your external drive, e.g. E:\backups\my_lora.safetensors.

  4. Run:

    certutil -hashfile "E:\backups\my_lora.safetensors" SHA256
    
  5. Compare the two hashes:

    • If they are identical, the file is bit-perfect.
    • If they are different, the file was changed or corrupted during copy.

Do the same after copying back from external to see if the round trip is safe.


3.6 Step 6: When you see NaNs, separate “corruption” from “math problems”

When ComfyUI says:

“NansException: A tensor with all NaNs was produced in Unet…” (GitHub)

use this simple checklist:

  1. Test with a base model only

    • In ComfyUI, disable all LoRAs and ControlNets.
    • Use a well-known base checkpoint.
    • If this already gives NaNs, your environment (precision, drivers, etc.) is the problem, not the LoRA file.
  2. Add one LoRA at a time

    • Turn on one LoRA, try again.

    • When NaNs start with a specific LoRA, check:

      • Does its SHA-256 hash match the original file?
      • Does it live in the correct folder?
      • Is it for the right model family (SD1.5 vs SDXL vs Flux)?
  3. If hashes match but NaNs only appear on new ComfyUI builds

    • Then the LoRA file is not corrupt.
    • You are hitting the same numeric/precision issues other users report after updates. Solutions there involve changing precision or GPU settings, not re-downloading. (GitHub)

4. Very short checklist

  • Cause 1 (common): copies to external drive are incomplete or damaged → safetensors error or NaNs.

    • Fix: back up as .7z archives, use 7z t to test them, and use certutil -hashfile ... SHA256 to confirm files are identical. (GitHub)
  • Cause 2: backing up or moving HF cache internals instead of clean model folders.

    • Fix: follow the HF “Local Model Backups” pattern: keep one per-model folder and back that up, do not touch cache internals. (Hugging Face Forums)
  • Cause 3: ComfyUI reinstall / update changes GPU math and precision → NaNs with same model.

    • Fix: treat ComfyUI as disposable, keep models in a central folder, and treat NaNs as environment issues when hashes match. (GitHub)
  • Cause 4: wrong mixes (wrong base model + LoRA + ControlNet) after reinstall.

    • Fix: test base model alone, then add components one by one and confirm they are for the same model family.

If you set up the single “models” folder, use archives for backup, and verify with hashes, you should reach a point where you never need to re-download the same LoRA again unless the original download itself was bad.

Thanks, i don’t understand what you mean by /cache or whatever, i only have .saftensors files.

Well it has to be the models that are corrupted since installing the SAME models from huggingface will fix the black output issue.

So you are saying, downloading all the models and putting them in a zip file will fix the issue ? The only annoying thing about this is that i often need to backup new models. I really thought simply leaving them in a folder would be good enough.

I was also using robocopy to copy the files with the same outcome.

It also can’t be a wrong model/lora combo, ive sucessfully generated over 2000 hours of videos with my workflows.

It’s just so annoying to not be able to properly store them.

I appreciate your complete answers and i will try to put them in a zip before backing them up and see what happens next time i need a fresh install.

Thank you very much

1 Like

Hmm… Zipping it up is more of a management issue, so I don’t think it really matters whether you do it or not. It doesn’t compress much anyway…

I think every time you reinstall ComfyUI, it loses track of the safetensors file location and throws a NaN. Basically, I think ComfyUI’s settings are messed up. Otherwise, it should break even without reinstalling ComfyUI.
But since I’m not a ComfyUI user myself, I can’t tell what’s wrong without asking the AI…