VertexAI Training and Gated model access

Assuming the token itself is correct.
In Vertex AI, it seems you need to explicitly pass the token “as an environment variable” to the container…


Short direct answer:

  • From the Hugging Face side, yes: a valid user access token with write scope (which includes read) is enough to access a gated model like google/txgemma-2b-predict, as long as the same HF account behind that token has accepted the TxGemma terms. (Hugging Face)
  • From the Vertex AI side, that token only works if you inject it into the training container as an environment variable (for example HF_TOKEN) and your code (or transformers/TRL) uses it when calling from_pretrained. The official Vertex + TRL examples for gated models do exactly this. (Hugging Face)

So: just having a valid token in your account is not enough; you must pass it to the Vertex job in the right way.

Below is the “correct way” with context and a concrete pattern.


1. Hugging Face side: gating + token requirements

1.1 TxGemma is a gated model

TxGemma models on Hugging Face (e.g. google/txgemma-9b-chat, google/txgemma-9b-predict, google/txgemma-2b-predict) are “gated”:

  • The repo page says:

    “This repository is publicly accessible, but you have to accept the conditions to access its files and content.” (Hugging Face)

  • To get access, you must:

    • Log in to Hugging Face in a browser.
    • Open the model page.
    • Click through and accept the Health AI Developer Foundations terms.
    • Access is granted immediately to that user account. (Hugging Face)

So step 0 is verifying that the HF user who owns your token actually sees “Access granted” on the TxGemma page.

1.2 Token scopes: read vs write

Hugging Face “User Access Tokens” are the standard way to authenticate apps and notebooks:

  • Docs: “User Access Tokens are the preferred way to authenticate… You can set the role (read, write, admin).” (Hugging Face)

  • For private or gated models, the requirement is: token must have read or broader scope. For example, BigQuery’s Vertex integration docs explicitly say:

    “The token must have the read role scope or broader” for gated models. (Google Cloud Documentation)

Your write token satisfies this; write ⊇ read, so scope is not the problem.

Typical error when gating is not correctly satisfied looks like what you may have seen on Kaggle for TxGemma:

“Access to model google/txgemma-2b-predict is restricted. You must have access to it and be authenticated to access it.” (Kaggle)

That can happen if:

  • Token is missing in the environment, or
  • Token belongs to a different user than the one who accepted the terms.

2. Vertex AI side: how to specify the token correctly

The canonical pattern is shown in the Hugging Face + Google Cloud examples for fine-tuning Mistral or Gemma with TRL on Vertex AI. They all do the same thing:

  1. Use a Hugging Face PyTorch Training DLC image.
  2. Create a CustomContainerTrainingJob whose command runs trl sft.
  3. Pass the HF token via environment_variables={"HF_TOKEN": ...}. (Hugging Face)

2.1. Official example pattern (Mistral / Gemma on Vertex)

From the “Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT on Vertex AI” guide: (Hugging Face)

  • They define a CustomContainerTrainingJob:

    job = aiplatform.CustomContainerTrainingJob(
        display_name="trl-full-sft",
        container_uri=os.getenv("CONTAINER_URI"),
        command=[
            "sh",
            "-c",
            'exec trl sft "$@"',
            "--",
        ],
    )
    
  • They build args for trl sft (model, dataset, hyperparameters). (Hugging Face)

  • Crucially, when they call job.submit(...), they pass the token via environment_variables:

    from huggingface_hub import get_token
    
    job.submit(
        args=args,
        replica_count=1,
        machine_type="a2-highgpu-4g",
        accelerator_type="NVIDIA_TESLA_A100",
        accelerator_count=4,
        base_output_dir=f"{os.getenv('BUCKET_URI')}/Mistral-7B-v0.3-SFT-Guanaco",
        environment_variables={
            "HF_HOME": "/root/.cache/huggingface",
            "HF_TOKEN": get_token(),
            "TRL_USE_RICH": "0",
            "ACCELERATE_LOG_LEVEL": "INFO",
            "TRANSFORMERS_LOG_LEVEL": "INFO",
            "TQDM_POSITION": "-1",
        },
    )
    

The Gemma/LoRA TRL example is identical in structure and explicitly notes:

“As you are fine-tuning a gated model … you need to set the HF_TOKEN environment variable.” (Hugging Face)

The Vertex AI community samples for Gemma and Llama follow the same pattern: they read HF_TOKEN from the notebook/UI and add it to env_vars before constructing the job. (GitHub)

2.2. How transformers / TRL picks up the token

Recent huggingface_hub / transformers automatically look for environment variables like:

  • HF_TOKEN
  • HUGGING_FACE_HUB_TOKEN

If one is set, AutoModel.from_pretrained("google/txgemma-2b-predict") will use it to authenticate. (Hugging Face)

So as long as:

  • Your TRL CLI (inside the DLC) uses standard from_pretrained, and
  • You pass HF_TOKEN correctly,

you do not need to manually add token=... in your training script.

If you want to be explicit in a custom script, you can still do:

import os
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "google/txgemma-2b-predict"
token = os.environ["HF_TOKEN"]

tokenizer = AutoTokenizer.from_pretrained(model_id, token=token)
model = AutoModelForCausalLM.from_pretrained(model_id, token=token)

But with the DLC + TRL CLI examples, the environment variable alone is normally enough.


3. Concrete “correct way” for your TxGemma job

Assuming:

  • You’re using the Hugging Face PyTorch Training DLC (as in the TRL examples).
  • You want to fine-tune google/txgemma-2b-predict with trl sft on Vertex AI.

The steps are:

  1. Verify HF-side gating with the right account

    • Log in to Hugging Face as the user who owns the token.
    • Open https://huggingface.co/google/txgemma-2b-predict.
    • Confirm that it shows you have access (no “request access” banner; terms accepted). (Hugging Face)
  2. Create or copy your HF token (write is fine)

    • Go to “Settings → Access Tokens” and create a token with read or write scope. (Hugging Face)
    • This token must belong to the same user as step 1.
  3. In your notebook/script, pass the token into Vertex via env vars

    In Python:

    from google.cloud import aiplatform
    import os
    
    aiplatform.init(
        project=os.getenv("PROJECT_ID"),
        location=os.getenv("LOCATION"),
        staging_bucket=os.getenv("BUCKET_URI"),
    )
    
    HF_TOKEN = "hf_xxx"  # or read from a secret / env var
    
    job = aiplatform.CustomContainerTrainingJob(
        display_name="txgemma-2b-sft",
        container_uri=os.getenv("CONTAINER_URI"),  # HF PyTorch DLC
        command=[
            "sh",
            "-c",
            'exec trl sft "$@"',
            "--",
        ],
    )
    
    args = [
        "--model_name_or_path=google/txgemma-2b-predict",
        "--torch_dtype=bfloat16",
        # + your dataset + training args...
    ]
    
    job.run(
        args=args,
        replica_count=1,
        machine_type="g2-standard-12",
        accelerator_type="NVIDIA_L4",
        accelerator_count=1,
        base_output_dir=f"{os.getenv('BUCKET_URI')}/txgemma-2b-sft",
        environment_variables={
            "HF_HOME": "/root/.cache/huggingface",
            "HF_TOKEN": HF_TOKEN,  # <- key line
            "TRL_USE_RICH": "0",
            "ACCELERATE_LOG_LEVEL": "INFO",
            "TRANSFORMERS_LOG_LEVEL": "INFO",
        },
    )
    

    This mirrors the official HF/Vertex examples almost exactly, just swapping mistralai/Mistral-7B-v0.3 for google/txgemma-2b-predict. (Hugging Face)

  4. Use the correct model ID

    In --model_name_or_path, use the full HF repo id:

    --model_name_or_path=google/txgemma-2b-predict
    

    not a Vertex Model Garden ID or a local alias. (Featherless)

  5. Check Cloud Logs if it still fails

    • 401 or “restricted” → token or gating issue (wrong account, typo in token, env var missing).
    • 404 → wrong model id.
    • Timeout / DNS → networking (VPC/SCC) blocking outbound access to huggingface.co. (Hugging Face Forums)

4. Answering your exact questions explicitly

is using a valid token (WRITE) enough to setup a custom training job on VertexAI using a gated model (aka TxGemma)?

From Hugging Face’s perspective: yes. Any token with read or broader scope is sufficient to download a gated model, provided the associated user has accepted the model’s terms. (Hugging Face)

From Vertex AI’s perspective: the token must be correctly passed into the container (as HF_TOKEN or HUGGING_FACE_HUB_TOKEN) and actually used by transformers / TRL. If your job “fails to retrieve the model,” the usual reasons are:

  • The token is not set in environment_variables,
  • It belongs to a user that hasn’t accepted TxGemma’s terms,
  • The model id is wrong, or
  • There’s a network restriction.

Correct way to specify Hugging Face token on VertexAI?

The standard, documented way is:

  • In your CustomContainerTrainingJob.run() (or .submit()), pass:

    environment_variables={
        "HF_TOKEN": "<your_hf_token>",
        "HF_HOME": "/root/.cache/huggingface",
        ...
    }
    

    exactly as the Mistral/Gemma TRL examples do when fine-tuning gated models. (Hugging Face)

  • Optionally, use HUGGING_FACE_HUB_TOKEN instead; recent huggingface_hub respects both names.

This is the “correct” and common approach used both in Hugging Face’s official docs and Google’s Vertex AI sample notebooks.