wei
zhuww
AI & ML interests
None yet
Recent Activity
updated
a collection
1 day ago
RL
updated
a collection
1 day ago
RL
updated
a collection
1 day ago
RL
Organizations
None yet
arena
-
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
Paper • 2509.21319 • Published • 5 -
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
Paper • 2507.23751 • Published • 4 -
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization
Paper • 2502.05605 • Published -
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
Paper • 2412.17395 • Published
code
-
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15 -
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 52 -
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
Paper • 2105.12655 • Published -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 151
LLM
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 192 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 38 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 64 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 272
RL
-
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper • 2510.00938 • Published • 58 -
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
Paper • 2509.19284 • Published • 22 -
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Paper • 2509.25810 • Published • 5 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
SWE
agentic
-
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Paper • 2506.21506 • Published • 51 -
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper • 2505.17612 • Published • 81 -
Efficient Agent Training for Computer Use
Paper • 2505.13909 • Published • 44 -
Scaling Agents via Continual Pre-training
Paper • 2509.13310 • Published • 117
reasoning llm
-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper • 2509.05739 • Published • 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper • 2509.03059 • Published • 24 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper • 2509.08358 • Published • 13
multi-turn
RL
-
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper • 2510.00938 • Published • 58 -
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
Paper • 2509.19284 • Published • 22 -
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Paper • 2509.25810 • Published • 5 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
arena
-
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
Paper • 2509.21319 • Published • 5 -
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
Paper • 2507.23751 • Published • 4 -
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization
Paper • 2502.05605 • Published -
WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models
Paper • 2412.17395 • Published
SWE
code
-
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15 -
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 52 -
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
Paper • 2105.12655 • Published -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 151
agentic
-
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Paper • 2506.21506 • Published • 51 -
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper • 2505.17612 • Published • 81 -
Efficient Agent Training for Computer Use
Paper • 2505.13909 • Published • 44 -
Scaling Agents via Continual Pre-training
Paper • 2509.13310 • Published • 117
LLM
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 192 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 38 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 64 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 272
reasoning llm
-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper • 2509.05739 • Published • 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper • 2509.03059 • Published • 24 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper • 2509.08358 • Published • 13