AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models Paper • 2511.14295 • Published 22 days ago • 71
MeXtract: Light-Weight Metadata Extraction from Scientific Papers Paper • 2510.06889 • Published Oct 8 • 1
Multimodal Safety Evaluation in Generative Agent Social Simulations Paper • 2510.07709 • Published Oct 9 • 13