Hmm…?
You are getting this error because your model graph is TensorFlow Keras, but the Transformer you loaded is the PyTorch DistilBERT class, and your Input() dtype is also a PyTorch dtype.
That mix forces Keras to handle indexing on a symbolic tensor using an indexing key it does not support. The specific key in your message includes a Python list [-1, 0], which is “advanced indexing” and is a common way to trigger slice-key errors when the wrong tensor type is flowing through the wrong backend.
What your code is doing, in plain terms
1) Your Input(shape=(512,)) is fine
In Keras, shape=(512,) means “each example is length 512”. Keras adds the batch dimension automatically so the runtime shape is (batch, 512). Keras explicitly documents that shape does not include batch size. (Keras)
So this is not “because it’s (512,)”.
2) Your dtype=torch.int32 is not valid for Keras Input
Keras expects dtype as a string like "int32" or a TF dtype like tf.int32. It documents “dtype … as a string (e.g. "int32")"`. (Keras)
Passing torch.int32 is a framework mismatch.
3) You loaded the wrong DistilBERT class for a TF/Keras TPU notebook
You used:
transformers.DistilBertModel.from_pretrained(...)
In Hugging Face docs, DistilBertModel is the PyTorch model and is a torch.nn.Module subclass. (Hugging Face)
For TensorFlow Keras, you must use:
TFDistilBertModel (or TFAutoModel)
Hugging Face docs explicitly say TFDistilBertModel is a keras.Model subclass, and they document the input formats Keras expects. (Hugging Face)
4) Why this slice error text shows up
Your exception says Keras received an indexing key like:
(slice(None, None, None), [-1, 0])
That means some code tried to do “slice all rows, then take columns [-1, 0]”. Python-list indexing like [-1, 0] is not universally supported for symbolic tensors, and it is especially likely to break when you’re accidentally routing through a backend that expects a different tensor type.
You did not write [-1, 0] yourself, so it is almost certainly happening inside the incompatible call path created by mixing Keras symbolic tensors with a PyTorch model.
The correct fix for your case (TF/Keras + TPU)
Use TF DistilBERT + TF dtype + (usually) attention mask
Hugging Face documents that TF models accept inputs either as keyword args or as a dict/list/tuple in the first positional argument because that’s what Keras likes. (Hugging Face)
Use this pattern:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from transformers import TFDistilBertModel
def build_model(transformer, max_len=512):
input_ids = Input(shape=(max_len,), dtype=tf.int32, name="input_ids")
attention_mask = Input(shape=(max_len,), dtype=tf.int32, name="attention_mask")
outputs = transformer({"input_ids": input_ids, "attention_mask": attention_mask})
sequence_output = outputs.last_hidden_state # (batch, seq, hidden)
cls_token = sequence_output[:, 0, :]
out = Dense(1, activation="sigmoid")(cls_token)
model = Model(inputs={"input_ids": input_ids, "attention_mask": attention_mask}, outputs=out)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
loss="binary_crossentropy",
metrics=["accuracy"],
)
return model
with strategy.scope():
transformer_layer = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")
model = build_model(transformer_layer, max_len=512)
Why this version matches the docs:
TFDistilBertModel is a keras.Model subclass. (Hugging Face)
- TF Transformers accept inputs as kwargs or dict/list/tuple for Keras compatibility. (Hugging Face)
- Keras
Input dtype should be "int32" or tf.int32, not a torch dtype. (Keras)
TPU context (why strategy.scope() matters)
Keras TPU usage expects you to create the model inside with strategy.scope(): so variables are placed correctly and training is distributed. This is the standard pattern shown in Keras TPU-related examples. (Keras)
If the Kaggle notebook was actually PyTorch (different route)
Then do not use tf.distribute.TPUStrategy or Keras Input at all. You would use a pure PyTorch model and a PyTorch data pipeline, or PyTorch/XLA for TPU. Mixing the TF TPU path with the PyTorch model class is exactly what breaks.
Quick “prove it” checks in your notebook
Run these to confirm what you loaded:
print(transformer_layer.__class__)
- If it contains
DistilBertModel, you loaded PyTorch. (Hugging Face)
- If it contains
TFDistilBertModel, you loaded the TF Keras model. (Hugging Face)
Also confirm dtype expectation:
- Keras
Input(... dtype=...) expects a string like "int32". (Keras)
What to google next (high-yield queries)
If you want more threads like yours, these searches reliably find them:
DistilBertModel KerasTensor Unsupported key type for array slice
TFDistilBertModel Functional API inputs dict
Keras Input dtype "int32" torch.int32
Kaggle TPU DistilBERT TFDistilBertModel
Summary
- Your
(512,) input shape is normal in Keras. Batch is implicit. (Keras)
- Your
dtype=torch.int32 is wrong for Keras Input. Use "int32" or tf.int32. (Keras)
- You loaded the PyTorch DistilBERT class (
DistilBertModel), but your notebook is TF/Keras TPU. Use TFDistilBertModel. (Hugging Face)