Hi,
I’m running into nan training_loss when training wav2vec2 xlsr with my custom dataset.
Weird thing is that even though training_loss goes to nan, eval_loss still goes down, and error_rate (cer and wer) also goes down.
I’ve experimented with lower learning_rate, but still getting similar behavior. I’m logging with wandb.
My graphs look like the following:
There’s no value for
train/loss after ~60 steps since it is nan, but eval/loss is still decreasing.
Has anyone experienced similar behavior?


