The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 229
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 16 days ago • 143
Comparing DPO with IPO and KTO Collection A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO. • 56 items • Updated Jan 8, 2025 • 32
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing Paper • 2509.08721 • Published Sep 10, 2025 • 662
Running on CPU Upgrade 385 Deep Reinforcement Learning Leaderboard 🚀 385 Display and search reinforcement learning leaderboard data