EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper • 2506.09827 • Published Jun 11 • 20
EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition Paper • 2505.20033 • Published May 26 • 4
Infinity Instruct: Scaling Instruction Selection and Synthesis to Enhance Language Models Paper • 2506.11116 • Published Jun 9 • 4
CCI4.0: A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models Paper • 2506.07463 • Published Jun 9 • 10
CCI4.0 Collection A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models • 5 items • Updated 6 days ago • 13
Rethinking Reflection in Pre-Training Collection Datasets & Artifacts related to the paper "Rethinking Reflection in Pre-Training" • 10 items • Updated Jun 18 • 4
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data Paper • 2410.18558 • Published Oct 24, 2024 • 19
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models Paper • 2410.18505 • Published Oct 24, 2024 • 11
Infinity MM Collection Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data • 6 items • Updated 6 days ago • 3
AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies Paper • 2408.06567 • Published Aug 13, 2024 • 2
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities Paper • 2211.06679 • Published Nov 12, 2022 • 2
AltDiffusion: A Multilingual Text-to-Image Diffusion Model Paper • 2308.09991 • Published Aug 19, 2023 • 3
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning Paper • 2408.07089 • Published Aug 9, 2024 • 14