Towards a Unified View of Large Language Model Post-Training Paper β’ 2509.04419 β’ Published Sep 4 β’ 75 β’ 7
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper β’ 2504.20966 β’ Published Apr 29 β’ 32 β’ 5
A Refined Analysis of Massive Activations in LLMs Paper β’ 2503.22329 β’ Published Mar 28 β’ 14 β’ 3