During generation, I’m using the constraint of max_length to stop if longer sequences are not required. However, I do not want the generation to stop if the sentence is not complete. Is there any reliable way to stop after one sentence has been generated ?
AFAIK, the generation should stop once it generates an end-of-sentence token, if you don’t specify max_length. You can use StoppingCriteria (which you implicitly do by setting max_length) to construct arbitrary constraints on when to stop your generation.
You’re right about EOS token. If I don’t specify max_length parameter, then the model can generate a long text which may stop making sense halfway through or deviates from the context provided. I want the generation to be a bit more natural. Can you please share an example of how StoppingCriteria would work ? Didn’t find the usage example in docs.
You can find some implementations here. And you can search for “stopping_criteria” in generation_utils.py to understand the usage.
I was struggling with this issue today. The easiest way (instead of providing a custom stopping criteria) is to set both min_length and max_length parameters to the same value. This ensures generations of exactly a given length, no shorter or longer.
Did it lead to the sentences naturally ending e.g. with a “.” or just abruptly finishing?
In my case, I set max_length to 100 and sentences end abruptly.
If anyone still struggles with this problem today as I was just a few minutes ago, I was able to find a solution by setting a system prompt that insists on it.
# standard hf code above
messages = [
{"role": "system", "content": "You are a friendly chatbot who always responds in full sentences only, marking the end of each sentence with a dot. Do not end your answer in the middle of a sentence!",},
{"role": "user", "content": prompt}
]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print("[FORMATTED PROMPT]:", formatted_prompt)
# Tokenize input
input_data = tokenizer(formatted_prompt, return_tensors="pt", padding=True).to(model.device)
# standard hf code below