Auth suddenly required for model access .... sometimes

John6666 · October 14, 2025, 12:00am

It appears there have been changes to Hub’s behavior between last year and this year. If you don’t modify existing code, I think the simplest approach is to use login.

Direct answers:

Your one-liner does not require auth because it downloads public model files through code paths that allow anonymous access and use the local cache if present. No token gets attached, so the request succeeds. (Hugging Face)
You suddenly “need auth” because a different code path is calling the Hub metadata API (e.g., HfApi().model_info(...).sha). If any Hugging Face token is present but invalid, huggingface_hub automatically adds it to requests. The server then returns 401 “Invalid credentials in Authorization header.” This started showing up more for users in 2025 and also whenever libraries added metadata lookups before load. You can avoid this either by removing the bad token, forcing offline/local-only, or passing token=None/False so the request is anonymous. (Hugging Face)

Background and context

How FastEmbed and the Hub interact

File downloads: hf_hub_download / snapshot_download fetch files and cache them. For public repos they work without any token. Cached files are then reused. Your one-liner uses this behavior. (Hugging Face)
Metadata calls: HfApi().model_info(repo_id) queries the Hub REST API for repo metadata and returns a commit SHA used for snapshotting. If your environment or keyring has a token, the library auto-attaches it. A stale or malformed token triggers 401 even for public repos. (Hugging Face)
FastEmbed sparse models like prithivida/Splade_PP_en_v1 are public and shown in Qdrant’s docs and examples, so anonymous file fetches and cached use work. (qdrant.tech)

Why it worked before and fails now

Environment drift: a token landed in HF_TOKEN / HUGGING_FACE_HUB_TOKEN or was saved by huggingface-cli login, turning formerly anonymous requests into authenticated ones. Env-var tokens override stored tokens. Date: docs updated through 2024–2025. (Hugging Face)
Library changes: newer releases or upstream code paths started hitting model_info(...) to resolve revisions or SHAs before load. Users began reporting 401s in Mar–Jul 2025 when an invalid token was present. (GitHub)

Make cached models work without auth

Pick one. All are valid.

A) Remove or neutralize the token

Unset env vars and logout so calls are anonymous:
```
# remove token influence
unset HF_TOKEN HUGGING_FACE_HUB_TOKEN
huggingface-cli logout   # optional: clears saved token
```
Then HfApi().model_info(repo, token=None) or token=False keeps requests unauthenticated. (Hugging Face)

B) Force full offline

Set offline mode before imports:
```
export HF_HUB_OFFLINE=1
```
This makes file loaders use the cache and makes any HfApi call raise OfflineModeIsEnabled; avoid or guard metadata calls in that mode. (Hugging Face)

C) Local-only loading via library flags

Where supported, pass local_files_only=True in the FastEmbed wrapper so underlying Hub downloads never run. Sparse support lagged in 2024, then matured; by 2025 maintainers confirm the flag is propagated. (GitHub)

D) Bypass the Hub API entirely

If your stack queries model_info(...).sha, skip it by pinning a known commit for the model in your config or load by absolute local path when your FastEmbed version supports it. Snapshotting by commit avoids revision resolution. (Hugging Face)

Concrete patterns

Minimal environment for local-only load

# cache is prewarmed already
export FASTEMBED_CACHE_PATH=/models/fastembed_cache
export DEFAULT_SPARSE_EMBEDDING_MODEL_NAME=prithivida/Splade_PP_en_v1
export HF_HUB_OFFLINE=1
unset HF_TOKEN HUGGING_FACE_HUB_TOKEN

# docs: HF offline + caching; SPLADE usage
# https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables
# https://qdrant.tech/documentation/fastembed/fastembed-splade/
from fastembed import SparseTextEmbedding
import os

model = SparseTextEmbedding(
    model_name=os.environ["DEFAULT_SPARSE_EMBEDDING_MODEL_NAME"],
    cache_dir=os.environ["FASTEMBED_CACHE_PATH"],
    # local_files_only=True  # if your fastembed version supports it
)

(Hugging Face)

Keep online, but anonymous

from huggingface_hub import HfApi
sha = HfApi().model_info("prithivida/Splade_PP_en_v1", token=None).sha

This prevents attaching a bad token. (Hugging Face)

LlamaIndex + Qdrant hybrid with local SPLADE
LlamaIndex’s Qdrant hybrid example runs SPLADE locally via FastEmbed; combine it with the env fixes above. (LlamaIndex)

Operational checklist

Verify cache is populated under your configured cache dir; the Hub stores refs, snapshots (by commit), and blobs. Date: docs current. (Hugging Face)
If behind a firewall or air-gapped: expect hangs unless you set HF_HUB_OFFLINE=1 or local_files_only=True; this is a known issue pattern. Date: Apr 30, 2024 report; later fixes propagate the flag. (GitHub)
If you must authenticate: ensure a valid token and scopes; many 401 reports in 2025 were token mistakes. (Hugging Face Forums)

Short, high-signal resources

Issues to watch

FastEmbed 401s to Hub, reports beginning Mar 14, 2025. (GitHub)
Firewall/offline hangs without local_files_only (Apr 30, 2024). (GitHub)
Sparse local_files_only gap (Sep 26, 2024), later propagation (Aug 6, 2025). (GitHub)

Docs

HF download + cache layout and snapshot behavior. Updated 2024–2025. (Hugging Face)
HF env vars and offline mode. Updated 2024–2025. (Hugging Face)
LlamaIndex Qdrant hybrid SPLADE example. Updated 2025. (LlamaIndex)
Qdrant FastEmbed SPLADE guide. Updated 2024–2025. (qdrant.tech)

Topic		Replies	Views
Basic auth on HF API 🤗Hub	1	437	August 18, 2023
Error 401 Client Error: Unauthorized for url 🤗Hub	80	116023	August 27, 2025
Authenticated but still unable to access model Beginners	8	32042	April 14, 2025
Problem access public model? Beginners	2	1080	January 30, 2025
Auth problem when using fine tuned model Beginners	0	1369	December 17, 2022