Assuming the token itself is correct.
In Vertex AI, it seems you need to explicitly pass the token “as an environment variable” to the container…
Short direct answer:
- From the Hugging Face side, yes: a valid user access token with
writescope (which includesread) is enough to access a gated model likegoogle/txgemma-2b-predict, as long as the same HF account behind that token has accepted the TxGemma terms. (Hugging Face) - From the Vertex AI side, that token only works if you inject it into the training container as an environment variable (for example
HF_TOKEN) and your code (ortransformers/TRL) uses it when callingfrom_pretrained. The official Vertex + TRL examples for gated models do exactly this. (Hugging Face)
So: just having a valid token in your account is not enough; you must pass it to the Vertex job in the right way.
Below is the “correct way” with context and a concrete pattern.
1. Hugging Face side: gating + token requirements
1.1 TxGemma is a gated model
TxGemma models on Hugging Face (e.g. google/txgemma-9b-chat, google/txgemma-9b-predict, google/txgemma-2b-predict) are “gated”:
-
The repo page says:
“This repository is publicly accessible, but you have to accept the conditions to access its files and content.” (Hugging Face)
-
To get access, you must:
- Log in to Hugging Face in a browser.
- Open the model page.
- Click through and accept the Health AI Developer Foundations terms.
- Access is granted immediately to that user account. (Hugging Face)
So step 0 is verifying that the HF user who owns your token actually sees “Access granted” on the TxGemma page.
1.2 Token scopes: read vs write
Hugging Face “User Access Tokens” are the standard way to authenticate apps and notebooks:
-
Docs: “User Access Tokens are the preferred way to authenticate… You can set the role (read, write, admin).” (Hugging Face)
-
For private or gated models, the requirement is: token must have read or broader scope. For example, BigQuery’s Vertex integration docs explicitly say:
“The token must have the
readrole scope or broader” for gated models. (Google Cloud Documentation)
Your write token satisfies this; write ⊇ read, so scope is not the problem.
Typical error when gating is not correctly satisfied looks like what you may have seen on Kaggle for TxGemma:
“Access to model google/txgemma-2b-predict is restricted. You must have access to it and be authenticated to access it.” (Kaggle)
That can happen if:
- Token is missing in the environment, or
- Token belongs to a different user than the one who accepted the terms.
2. Vertex AI side: how to specify the token correctly
The canonical pattern is shown in the Hugging Face + Google Cloud examples for fine-tuning Mistral or Gemma with TRL on Vertex AI. They all do the same thing:
- Use a Hugging Face PyTorch Training DLC image.
- Create a
CustomContainerTrainingJobwhose command runstrl sft. - Pass the HF token via
environment_variables={"HF_TOKEN": ...}. (Hugging Face)
2.1. Official example pattern (Mistral / Gemma on Vertex)
From the “Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT on Vertex AI” guide: (Hugging Face)
-
They define a
CustomContainerTrainingJob:job = aiplatform.CustomContainerTrainingJob( display_name="trl-full-sft", container_uri=os.getenv("CONTAINER_URI"), command=[ "sh", "-c", 'exec trl sft "$@"', "--", ], ) -
They build
argsfortrl sft(model, dataset, hyperparameters). (Hugging Face) -
Crucially, when they call
job.submit(...), they pass the token viaenvironment_variables:from huggingface_hub import get_token job.submit( args=args, replica_count=1, machine_type="a2-highgpu-4g", accelerator_type="NVIDIA_TESLA_A100", accelerator_count=4, base_output_dir=f"{os.getenv('BUCKET_URI')}/Mistral-7B-v0.3-SFT-Guanaco", environment_variables={ "HF_HOME": "/root/.cache/huggingface", "HF_TOKEN": get_token(), "TRL_USE_RICH": "0", "ACCELERATE_LOG_LEVEL": "INFO", "TRANSFORMERS_LOG_LEVEL": "INFO", "TQDM_POSITION": "-1", }, )
The Gemma/LoRA TRL example is identical in structure and explicitly notes:
“As you are fine-tuning a gated model … you need to set the
HF_TOKENenvironment variable.” (Hugging Face)
The Vertex AI community samples for Gemma and Llama follow the same pattern: they read HF_TOKEN from the notebook/UI and add it to env_vars before constructing the job. (GitHub)
2.2. How transformers / TRL picks up the token
Recent huggingface_hub / transformers automatically look for environment variables like:
HF_TOKENHUGGING_FACE_HUB_TOKEN
If one is set, AutoModel.from_pretrained("google/txgemma-2b-predict") will use it to authenticate. (Hugging Face)
So as long as:
- Your TRL CLI (inside the DLC) uses standard
from_pretrained, and - You pass
HF_TOKENcorrectly,
you do not need to manually add token=... in your training script.
If you want to be explicit in a custom script, you can still do:
import os
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "google/txgemma-2b-predict"
token = os.environ["HF_TOKEN"]
tokenizer = AutoTokenizer.from_pretrained(model_id, token=token)
model = AutoModelForCausalLM.from_pretrained(model_id, token=token)
But with the DLC + TRL CLI examples, the environment variable alone is normally enough.
3. Concrete “correct way” for your TxGemma job
Assuming:
- You’re using the Hugging Face PyTorch Training DLC (as in the TRL examples).
- You want to fine-tune
google/txgemma-2b-predictwithtrl sfton Vertex AI.
The steps are:
-
Verify HF-side gating with the right account
- Log in to Hugging Face as the user who owns the token.
- Open
https://huggingface.co/google/txgemma-2b-predict. - Confirm that it shows you have access (no “request access” banner; terms accepted). (Hugging Face)
-
Create or copy your HF token (write is fine)
- Go to “Settings → Access Tokens” and create a token with read or write scope. (Hugging Face)
- This token must belong to the same user as step 1.
-
In your notebook/script, pass the token into Vertex via env vars
In Python:
from google.cloud import aiplatform import os aiplatform.init( project=os.getenv("PROJECT_ID"), location=os.getenv("LOCATION"), staging_bucket=os.getenv("BUCKET_URI"), ) HF_TOKEN = "hf_xxx" # or read from a secret / env var job = aiplatform.CustomContainerTrainingJob( display_name="txgemma-2b-sft", container_uri=os.getenv("CONTAINER_URI"), # HF PyTorch DLC command=[ "sh", "-c", 'exec trl sft "$@"', "--", ], ) args = [ "--model_name_or_path=google/txgemma-2b-predict", "--torch_dtype=bfloat16", # + your dataset + training args... ] job.run( args=args, replica_count=1, machine_type="g2-standard-12", accelerator_type="NVIDIA_L4", accelerator_count=1, base_output_dir=f"{os.getenv('BUCKET_URI')}/txgemma-2b-sft", environment_variables={ "HF_HOME": "/root/.cache/huggingface", "HF_TOKEN": HF_TOKEN, # <- key line "TRL_USE_RICH": "0", "ACCELERATE_LOG_LEVEL": "INFO", "TRANSFORMERS_LOG_LEVEL": "INFO", }, )This mirrors the official HF/Vertex examples almost exactly, just swapping
mistralai/Mistral-7B-v0.3forgoogle/txgemma-2b-predict. (Hugging Face) -
Use the correct model ID
In
--model_name_or_path, use the full HF repo id:--model_name_or_path=google/txgemma-2b-predictnot a Vertex Model Garden ID or a local alias. (Featherless)
-
Check Cloud Logs if it still fails
- 401 or “restricted” → token or gating issue (wrong account, typo in token, env var missing).
- 404 → wrong model id.
- Timeout / DNS → networking (VPC/SCC) blocking outbound access to
huggingface.co. (Hugging Face Forums)
4. Answering your exact questions explicitly
is using a valid token (WRITE) enough to setup a custom training job on VertexAI using a gated model (aka TxGemma)?
From Hugging Face’s perspective: yes. Any token with read or broader scope is sufficient to download a gated model, provided the associated user has accepted the model’s terms. (Hugging Face)
From Vertex AI’s perspective: the token must be correctly passed into the container (as HF_TOKEN or HUGGING_FACE_HUB_TOKEN) and actually used by transformers / TRL. If your job “fails to retrieve the model,” the usual reasons are:
- The token is not set in
environment_variables, - It belongs to a user that hasn’t accepted TxGemma’s terms,
- The model id is wrong, or
- There’s a network restriction.
Correct way to specify Hugging Face token on VertexAI?
The standard, documented way is:
-
In your
CustomContainerTrainingJob.run()(or.submit()), pass:environment_variables={ "HF_TOKEN": "<your_hf_token>", "HF_HOME": "/root/.cache/huggingface", ... }exactly as the Mistral/Gemma TRL examples do when fine-tuning gated models. (Hugging Face)
-
Optionally, use
HUGGING_FACE_HUB_TOKENinstead; recenthuggingface_hubrespects both names.
This is the “correct” and common approach used both in Hugging Face’s official docs and Google’s Vertex AI sample notebooks.