InnoGym: Benchmarking the Innovation Potential of AI Agents Paper • 2512.01822 • Published 8 days ago • 33
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published Nov 7 • 52
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30 • 115
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21 • 110
OceanGym: A Benchmark Environment for Underwater Embodied Agents Paper • 2509.26536 • Published Sep 30 • 34
Towards Personalized Deep Research: Benchmarks and Evaluations Paper • 2509.25106 • Published Sep 29 • 29
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26 • 134
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models Paper • 2505.02847 • Published May 1 • 28