Recommended sampling parameters

#6
by sszymczyk - opened

What are the recommended values of sampling parameters - temperature/top_p/top_k?

I always test Qwen3-based thinking MoE models at:

Temp 0.6
Top K 20
Repeat Penalty 1.05
Min P 0.05
Top P 0.95

Before seeing other explicit guidance, I will judge model performance based on these values.

No, the same parameters do not apply universally across different models. Your settings are an excellent starting point for testing Qwen3-based MoE models, but they will not be optimal, or even appropriate, for all models.

Different "Temperature" Sensitivity: A model trained with very strict, deterministic fine-tuning (e.g., some code models) might become incoherent or produce low-quality output at Temp 0.6. Conversely, a very creative storytelling model might need a higher temperature to shine.

MoE vs. Dense: Your Top-K 20 works well for MoEs because they have diverse "expert" pathways. A dense model of the same size might benefit from a slightly lower Top-K to stay more focused.

Scale: A 7B parameter model is often more "nervous" and less coherent than a 70B model at the same temperature. Larger models can generally tolerate higher temperatures while remaining logical

@JLouisBiz Thanks for the background info! That's why I'd like the model makers to publish this info on their model card... NOT just linking to github or some other documentation. I always like to quote "If it's not documented, it doesn't exist"... 😈

yeah, I tend to agree.. having canonical guidance from those that know the guts of systems so complex as these really helps avoid easy-to-get-wrong tuning knobs.

glm 4.7 is, so far, the most thoroughly capable model i’ve worked with.
y’all have done astounding work. Thank you.

AND…. guidance on how best to use this incredible instrument of …. science/art… is greatly appreciated

sszymczyk changed discussion status to closed

Sign up or log in to comment