ValueError: Unsupported key type for array slice. Received: `(slice(None, None, None), [-1, 0])`

Why am i getting this error? Thanks in advance!

P.S: I’m following a kaggle notebook. I tried to google it but still getting what should i ask from google. As far as i’ve understand, everything is working fine till the,

sequence_output = transformer(input_word_ids)[0]

i’m getting the inputs of the dimension (512, ) and when this input is passed to the transformer which is distilbert in this case it is somehow not working on this input. I want to understand where and what is the problem? Is there an issue in the shape of input or anything else?

Code:

# Loading model into the TPU 


%%time 
with strategy.scope():
  transformer_layer = (
      transformers.DistilBertModel 
      .from_pretrained('distilbert-base-multilingual-cased')
  )
  model = build_model(transformer_layer, max_len=MAX_LEN)


model.summary()

# importing torch
import torch

# function to build the model
def build_model(transformer, max_len=512):
  input_word_ids = Input(shape=(max_len, ), dtype=torch.int32, name="input_word_ids")
  sequence_output = transformer(input_word_ids)[0]
  cls_token = sequence_output[:, 0, :]
  out = Dense(1, activation='sigmoid')(cls_token)

  model = Model(inputs=input_word_ids, outputs=out)
  model.compile(Adam(lr=1e-5),
                loss='binary_crossentropy',
                metrics=['accuracy'])

  return model

Error:

ValueError: Unsupported key type for array slice. Received: `(slice(None, None, None), [-1, 0])
1 Like

Hmm…?


You are getting this error because your model graph is TensorFlow Keras, but the Transformer you loaded is the PyTorch DistilBERT class, and your Input() dtype is also a PyTorch dtype.

That mix forces Keras to handle indexing on a symbolic tensor using an indexing key it does not support. The specific key in your message includes a Python list [-1, 0], which is “advanced indexing” and is a common way to trigger slice-key errors when the wrong tensor type is flowing through the wrong backend.

What your code is doing, in plain terms

1) Your Input(shape=(512,)) is fine

In Keras, shape=(512,) means “each example is length 512”. Keras adds the batch dimension automatically so the runtime shape is (batch, 512). Keras explicitly documents that shape does not include batch size. (Keras)

So this is not “because it’s (512,)”.

2) Your dtype=torch.int32 is not valid for Keras Input

Keras expects dtype as a string like "int32" or a TF dtype like tf.int32. It documents “dtype … as a string (e.g. "int32")"`. (Keras)

Passing torch.int32 is a framework mismatch.

3) You loaded the wrong DistilBERT class for a TF/Keras TPU notebook

You used:

transformers.DistilBertModel.from_pretrained(...)

In Hugging Face docs, DistilBertModel is the PyTorch model and is a torch.nn.Module subclass. (Hugging Face)

For TensorFlow Keras, you must use:

  • TFDistilBertModel (or TFAutoModel)

Hugging Face docs explicitly say TFDistilBertModel is a keras.Model subclass, and they document the input formats Keras expects. (Hugging Face)

4) Why this slice error text shows up

Your exception says Keras received an indexing key like:

(slice(None, None, None), [-1, 0])

That means some code tried to do “slice all rows, then take columns [-1, 0]”. Python-list indexing like [-1, 0] is not universally supported for symbolic tensors, and it is especially likely to break when you’re accidentally routing through a backend that expects a different tensor type.

You did not write [-1, 0] yourself, so it is almost certainly happening inside the incompatible call path created by mixing Keras symbolic tensors with a PyTorch model.

The correct fix for your case (TF/Keras + TPU)

Use TF DistilBERT + TF dtype + (usually) attention mask

Hugging Face documents that TF models accept inputs either as keyword args or as a dict/list/tuple in the first positional argument because that’s what Keras likes. (Hugging Face)

Use this pattern:

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from transformers import TFDistilBertModel

def build_model(transformer, max_len=512):
    input_ids = Input(shape=(max_len,), dtype=tf.int32, name="input_ids")
    attention_mask = Input(shape=(max_len,), dtype=tf.int32, name="attention_mask")

    outputs = transformer({"input_ids": input_ids, "attention_mask": attention_mask})
    sequence_output = outputs.last_hidden_state  # (batch, seq, hidden)

    cls_token = sequence_output[:, 0, :]
    out = Dense(1, activation="sigmoid")(cls_token)

    model = Model(inputs={"input_ids": input_ids, "attention_mask": attention_mask}, outputs=out)
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
        loss="binary_crossentropy",
        metrics=["accuracy"],
    )
    return model

with strategy.scope():
    transformer_layer = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")
    model = build_model(transformer_layer, max_len=512)

Why this version matches the docs:

  • TFDistilBertModel is a keras.Model subclass. (Hugging Face)
  • TF Transformers accept inputs as kwargs or dict/list/tuple for Keras compatibility. (Hugging Face)
  • Keras Input dtype should be "int32" or tf.int32, not a torch dtype. (Keras)

TPU context (why strategy.scope() matters)

Keras TPU usage expects you to create the model inside with strategy.scope(): so variables are placed correctly and training is distributed. This is the standard pattern shown in Keras TPU-related examples. (Keras)

If the Kaggle notebook was actually PyTorch (different route)

Then do not use tf.distribute.TPUStrategy or Keras Input at all. You would use a pure PyTorch model and a PyTorch data pipeline, or PyTorch/XLA for TPU. Mixing the TF TPU path with the PyTorch model class is exactly what breaks.

Quick “prove it” checks in your notebook

Run these to confirm what you loaded:

print(transformer_layer.__class__)
  • If it contains DistilBertModel, you loaded PyTorch. (Hugging Face)
  • If it contains TFDistilBertModel, you loaded the TF Keras model. (Hugging Face)

Also confirm dtype expectation:

  • Keras Input(... dtype=...) expects a string like "int32". (Keras)

What to google next (high-yield queries)

If you want more threads like yours, these searches reliably find them:

  • DistilBertModel KerasTensor Unsupported key type for array slice
  • TFDistilBertModel Functional API inputs dict
  • Keras Input dtype "int32" torch.int32
  • Kaggle TPU DistilBERT TFDistilBertModel

Summary

  • Your (512,) input shape is normal in Keras. Batch is implicit. (Keras)
  • Your dtype=torch.int32 is wrong for Keras Input. Use "int32" or tf.int32. (Keras)
  • You loaded the PyTorch DistilBERT class (DistilBertModel), but your notebook is TF/Keras TPU. Use TFDistilBertModel. (Hugging Face)
1 Like

First of all thankyou for your reply, you explained everything so well. I’ve made the changes you told but now i’m getting a different type of error. The model loads but then i get this strange error:

ERROR:

model.safetensors: 100%

542M/542M [00:07<00:00, 154MB/s]

TensorFlow and JAX classes are deprecated and will be removed in Transformers v5. We recommend migrating to PyTorch classes or pinning your version of Transformers.

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<timed exec> in <module>


/usr/local/lib/python3.12/dist-packages/transformers/modeling_tf_utils.py in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   2962                 # We load in TF format here because PT weights often need to be transposed, and this is much
   2963                 # faster on GPU. Loading as numpy and transposing on CPU adds several seconds to load times.
-> 2964                 return load_pytorch_state_dict_in_tf2_model(
   2965                     model,
   2966                     safetensors_archive,


/usr/local/lib/python3.12/dist-packages/transformers/modeling_tf_pytorch_utils.py in load_pytorch_state_dict_in_tf2_model(tf_model, pt_state_dict, tf_inputs, allow_missing_keys, output_loading_info, _prefix, tf_to_pt_weight_rename, ignore_mismatched_sizes, skip_logger_warnings)
    331     # Convert old format to new format if needed from a PyTorch state_dict
    332     tf_keys_to_pt_keys = {}
--> 333     for key in pt_state_dict:
    334         new_key = None
    335         if "gamma" in key:


TypeError: 'builtins.safe_open' object is not iterable
1 Like

Seems a bug of library…?


You are hitting a Transformers TensorFlow weight-loading bug/regression that shows up when a TF from_pretrained() call tries to convert PyTorch .safetensors weights into a TF model.

That is why you see:

  • model.safetensors: 100% ... (normal download)
  • then TypeError: 'builtins.safe_open' object is not iterable (crash while reading the safetensors archive)

This is not caused by your (512,) input shape.


What is actually failing (step-by-step)

1) You are loading a TensorFlow model class

Your stack trace is in:

  • transformers/modeling_tf_utils.py → TF loader
  • transformers/modeling_tf_pytorch_utils.py → PT→TF conversion utilities

That only happens when you’re using a TF class like TFDistilBertModel, TFAutoModel..., TFDistilBertFor..., etc.

A very similar reproduction is reported here (same line, same error): TF model .from_pretrained() crashes because safe_open is “not iterable”. (GitHub)

2) The checkpoint being downloaded is a PyTorch safetensors file

You see model.safetensors being downloaded. That is a PyTorch checkpoint format.

Transformers then tries to do:

  • open the safetensors archive (lazy reader) using safetensors.safe_open
  • iterate keys in it to map PT tensor names → TF tensor names

3) The bug: Transformers treats safe_open(...) like a dict

safetensors.safe_open(...) returns a reader object, not a Python dict.
Per safetensors docs, you use it like:

  • f.keys() to list tensors
  • f.get_tensor(name) to load one tensor

But the failing Transformers code path is doing something effectively like:

  • for key in pt_state_dict: where pt_state_dict is that safe_open object

And that raises:

  • TypeError: 'builtins.safe_open' object is not iterable

This is reported as a Transformers bug/regression in issue #40318. (GitHub)
It is also reported specifically for DistilBERT TF classes in TensorFlow’s tracker (#105815). (GitHub)

So the root cause is: Transformers TF loader + safetensors lazy reader mismatch.


The warning you saw is separate (but important)

“TensorFlow and JAX classes are deprecated and will be removed in Transformers v5 …”

That message is a deprecation warning, not the crash. It means HF is moving away from TF/JAX support, so TF breakages are more likely and may be fixed slowly or not at all. The Transformers v5 release notes explicitly mention deprecating TF and JAX. (GitHub)
Google AI Developers Forum replies also summarize that TF support is deprecated and recommend PyTorch. (Google AI Developers Forum)

Practical implication: you often need to pin versions or use workarounds if you stay on TF.


Fix options (use the one that matches your constraints)

Option A (fastest): Force Transformers to NOT use safetensors

This avoids the buggy safe_open path by downloading a .bin checkpoint instead.

from transformers import TFDistilBertModel

transformer_layer = TFDistilBertModel.from_pretrained(
    "distilbert-base-multilingual-cased",
    use_safetensors=False,
)

This exact workaround is mentioned in the bug report: setting use_safetensors=False allows the download/load to proceed. (GitHub)

Notes:

  • This does not fix TF deprecation. It just bypasses this specific crash.
  • It is still doing PT→TF conversion if TF weights are not available.

Option B (most stable): Pin to a known working dependency set

Many people are unblocking the same error by pinning Transformers + TF and adding tf-keras.

A commonly reported working combo:

  • transformers==4.49.0
  • tensorflow==2.19.1 or 2.20.0
  • tf-keras installed (Reddit)

Example installs:

pip install "transformers==4.49.0" "tensorflow==2.20.0" tf-keras

Why tf-keras matters:

  • Newer environments often default to Keras 3 behavior.
  • Transformers TF support historically expects Keras 2 style objects.
  • There are multiple “KerasTensor / legacy Keras” incompatibility reports and fixes that involve tf-keras / legacy Keras configuration. (GitHub)

Option C (future-proof): Switch to PyTorch models

This avoids the entire TF loader + conversion layer, and aligns with where Transformers development is going. (Google AI Developers Forum)

If you are on TPU, this may or may not be acceptable depending on whether you’re set up for PyTorch/XLA.


Your model-building code: what it should look like in TF (and common pitfalls)

1) Use TF dtypes, not torch dtypes

If you are building a Keras model, use tf.int32, not torch.int32.

2) Provide attention_mask (strongly recommended)

DistilBERT can run without it, but performance is usually worse because padding tokens get attended to.

Hugging Face’s DistilBERT docs note TF models accept inputs as keyword args or as tuple/list/dict. (Hugging Face)

Example functional model:

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from transformers import TFDistilBertModel

def build_model(transformer, max_len=512):
    input_ids = Input(shape=(max_len,), dtype=tf.int32, name="input_ids")
    attention_mask = Input(shape=(max_len,), dtype=tf.int32, name="attention_mask")

    outputs = transformer(input_ids=input_ids, attention_mask=attention_mask)
    sequence_output = outputs.last_hidden_state  # (batch, seq, hidden)

    cls_token = sequence_output[:, 0, :]
    out = Dense(1, activation="sigmoid")(cls_token)

    return Model(inputs=[input_ids, attention_mask], outputs=out)

3) If you see “KerasTensor not allowed … only tf.Tensor accepted”

That is a separate TF/Keras integration issue that shows up in some recent stacks (Keras 3 vs legacy expectations). It’s referenced in the same general ecosystem of TF breakages. (GitHub)
In practice, Option B (pin versions + tf-keras) is the most reliable way to avoid that class of errors.


Similar cases online (same error, same root cause)

  1. Transformers issue #40318
    Reports TypeError: 'builtins.safe_open' object is not iterable in load_pytorch_state_dict_in_tf2_model line 333. (GitHub)

  2. TensorFlow issue #105815
    Reports the error for TFDistilBertForTokenClassification.from_pretrained("distilbert-base-uncased"). (GitHub)

  3. Community threads (symptom + pin workaround)
    Users report resolving with transformers==4.49.0, TF 2.19/2.20, and tf-keras. (Reddit)


Good references to learn TF + Transformers the “supported way”

  • Hugging Face LLM Course (TensorFlow + Keras fine-tuning): shows the intended training flow with TF model classes. (Hugging Face)
  • Hugging Face DistilBERT docs (TF usage + accepted input formats). (Hugging Face)
  • Transformers issue on KerasTensor / Keras 3 incompatibilities (why tf-keras is often required). (GitHub)
  • KerasHub HF integration guide (Keras-first direction; useful if you want a more Keras-native path longer-term). (Keras)

What I would do in your exact situation

  1. Patch the immediate crash:
  • add use_safetensors=False to your TF from_pretrained() call. (GitHub)
  1. If you get any follow-on TF/Keras weirdness:
  • pin transformers==4.49.0, TF 2.19/2.20, install tf-keras. (Reddit)
  1. Update your model inputs:
  • switch dtype to tf.int32
  • add attention_mask
  • use outputs.last_hidden_state instead of [0] for readability

Summary

  • The crash is a Transformers TF loader bug when reading .safetensors via safe_open. (GitHub)
  • Quick workaround: use_safetensors=False. (GitHub)
  • More stable workaround: pin Transformers + TF versions and install tf-keras. (Reddit)
  • The deprecation warning is real: TF support is being phased out in Transformers v5. (GitHub)
1 Like