Xiao Hu's picture

3 3

Xiao Hu

huxiao09

·

huxiao09

AI & ML interests

Reinforcement Learning, LLM Reasoning

Recent Activity

liked a model about 2 months ago

Kwai-Keye/Keye-VL-671B-A37B

upvoted a paper 5 months ago

Thyme: Think Beyond Images

authored a paper 6 months ago

Query-Policy Misalignment in Preference-Based Reinforcement Learning

View all activity

Organizations

None yet

authored 5 papers 6 months ago

Query-Policy Misalignment in Preference-Based Reinforcement Learning

Paper • 2305.17400 • Published May 27, 2023

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Paper • 2402.03046 • Published Feb 5, 2024 • 7

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Paper • 2505.02835 • Published May 5, 2025 • 28

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

Paper • 2505.21067 • Published May 27, 2025 • 3

Kwai Keye-VL Technical Report

Paper • 2507.01949 • Published Jul 2, 2025 • 130