MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment Paper • 2512.09636 • Published Dec 10, 2025 • 25
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation Paper • 2510.09116 • Published Oct 10, 2025 • 95
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs Paper • 2510.08886 • Published Oct 10, 2025 • 19
From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models Paper • 2508.13491 • Published Aug 19, 2025 • 59
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance Paper • 2502.08127 • Published Feb 12, 2025 • 58
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Paper • 2506.14028 • Published Jun 16, 2025 • 93
FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading Paper • 2502.11433 • Published Feb 17, 2025 • 36