It appears there have been changes to Hub’s behavior between last year and this year. If you don’t modify existing code, I think the simplest approach is to use login.
Direct answers:
-
Your one-liner does not require auth because it downloads public model files through code paths that allow anonymous access and use the local cache if present. No token gets attached, so the request succeeds. (Hugging Face)
-
You suddenly “need auth” because a different code path is calling the Hub metadata API (e.g.,
HfApi().model_info(...).sha). If any Hugging Face token is present but invalid,huggingface_hubautomatically adds it to requests. The server then returns 401 “Invalid credentials in Authorization header.” This started showing up more for users in 2025 and also whenever libraries added metadata lookups before load. You can avoid this either by removing the bad token, forcing offline/local-only, or passingtoken=None/Falseso the request is anonymous. (Hugging Face)
Background and context
How FastEmbed and the Hub interact
- File downloads:
hf_hub_download/snapshot_downloadfetch files and cache them. For public repos they work without any token. Cached files are then reused. Your one-liner uses this behavior. (Hugging Face) - Metadata calls:
HfApi().model_info(repo_id)queries the Hub REST API for repo metadata and returns a commit SHA used for snapshotting. If your environment or keyring has a token, the library auto-attaches it. A stale or malformed token triggers 401 even for public repos. (Hugging Face) - FastEmbed sparse models like
prithivida/Splade_PP_en_v1are public and shown in Qdrant’s docs and examples, so anonymous file fetches and cached use work. (qdrant.tech)
Why it worked before and fails now
- Environment drift: a token landed in
HF_TOKEN/HUGGING_FACE_HUB_TOKENor was saved byhuggingface-cli login, turning formerly anonymous requests into authenticated ones. Env-var tokens override stored tokens. Date: docs updated through 2024–2025. (Hugging Face) - Library changes: newer releases or upstream code paths started hitting
model_info(...)to resolve revisions or SHAs before load. Users began reporting 401s in Mar–Jul 2025 when an invalid token was present. (GitHub)
Make cached models work without auth
Pick one. All are valid.
A) Remove or neutralize the token
-
Unset env vars and logout so calls are anonymous:
# remove token influence unset HF_TOKEN HUGGING_FACE_HUB_TOKEN huggingface-cli logout # optional: clears saved tokenThen
HfApi().model_info(repo, token=None)ortoken=Falsekeeps requests unauthenticated. (Hugging Face)
B) Force full offline
-
Set offline mode before imports:
export HF_HUB_OFFLINE=1This makes file loaders use the cache and makes any
HfApicall raiseOfflineModeIsEnabled; avoid or guard metadata calls in that mode. (Hugging Face)
C) Local-only loading via library flags
- Where supported, pass
local_files_only=Truein the FastEmbed wrapper so underlying Hub downloads never run. Sparse support lagged in 2024, then matured; by 2025 maintainers confirm the flag is propagated. (GitHub)
D) Bypass the Hub API entirely
- If your stack queries
model_info(...).sha, skip it by pinning a known commit for the model in your config or load by absolute local path when your FastEmbed version supports it. Snapshotting by commit avoids revision resolution. (Hugging Face)
Concrete patterns
Minimal environment for local-only load
# cache is prewarmed already
export FASTEMBED_CACHE_PATH=/models/fastembed_cache
export DEFAULT_SPARSE_EMBEDDING_MODEL_NAME=prithivida/Splade_PP_en_v1
export HF_HUB_OFFLINE=1
unset HF_TOKEN HUGGING_FACE_HUB_TOKEN
# docs: HF offline + caching; SPLADE usage
# https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables
# https://qdrant.tech/documentation/fastembed/fastembed-splade/
from fastembed import SparseTextEmbedding
import os
model = SparseTextEmbedding(
model_name=os.environ["DEFAULT_SPARSE_EMBEDDING_MODEL_NAME"],
cache_dir=os.environ["FASTEMBED_CACHE_PATH"],
# local_files_only=True # if your fastembed version supports it
)
Keep online, but anonymous
from huggingface_hub import HfApi
sha = HfApi().model_info("prithivida/Splade_PP_en_v1", token=None).sha
This prevents attaching a bad token. (Hugging Face)
LlamaIndex + Qdrant hybrid with local SPLADE
LlamaIndex’s Qdrant hybrid example runs SPLADE locally via FastEmbed; combine it with the env fixes above. (LlamaIndex)
Operational checklist
- Verify cache is populated under your configured cache dir; the Hub stores refs, snapshots (by commit), and blobs. Date: docs current. (Hugging Face)
- If behind a firewall or air-gapped: expect hangs unless you set
HF_HUB_OFFLINE=1orlocal_files_only=True; this is a known issue pattern. Date: Apr 30, 2024 report; later fixes propagate the flag. (GitHub) - If you must authenticate: ensure a valid token and scopes; many 401 reports in 2025 were token mistakes. (Hugging Face Forums)
Short, high-signal resources
Issues to watch
- FastEmbed 401s to Hub, reports beginning Mar 14, 2025. (GitHub)
- Firewall/offline hangs without
local_files_only(Apr 30, 2024). (GitHub) - Sparse
local_files_onlygap (Sep 26, 2024), later propagation (Aug 6, 2025). (GitHub)
Docs
- HF download + cache layout and snapshot behavior. Updated 2024–2025. (Hugging Face)
- HF env vars and offline mode. Updated 2024–2025. (Hugging Face)
- LlamaIndex Qdrant hybrid SPLADE example. Updated 2025. (LlamaIndex)
- Qdrant FastEmbed SPLADE guide. Updated 2024–2025. (qdrant.tech)