I am confused about two observations regarding beam search:
-
From reading @patrickvonplaten How to generate text: using different decoding methods for language generation with Transformers](https://How to generate text: using different decoding methods for language generation with Transformers), it is my understanding that each beam in
BeamSearchEncoderDecoderOutputshould begin with a different token, or am I wrong in that assertion? -
The documentation for
BeamSearchEncoderDecoderOutputfor thesequencesparameter states that “The second dimension (sequence_length) is either equal tomax_lengthor shorter if all batches finished early due to theeos_token_id.”. In my observations it’s always been longer thanmax_length. How come?
Thanks 