Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Xiao Hu's picture
3 3

Xiao Hu

huxiao09
·
  • huxiao09

AI & ML interests

Reinforcement Learning, LLM Reasoning

Recent Activity

liked a model about 2 months ago
Kwai-Keye/Keye-VL-671B-A37B
upvoted a paper 5 months ago
Thyme: Think Beyond Images
authored a paper 6 months ago
Query-Policy Misalignment in Preference-Based Reinforcement Learning
View all activity

Organizations

None yet

authored 5 papers 6 months ago

Query-Policy Misalignment in Preference-Based Reinforcement Learning

Paper • 2305.17400 • Published May 27, 2023

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Paper • 2402.03046 • Published Feb 5, 2024 • 7

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Paper • 2505.02835 • Published May 5, 2025 • 28

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

Paper • 2505.21067 • Published May 27, 2025 • 3

Kwai Keye-VL Technical Report

Paper • 2507.01949 • Published Jul 2, 2025 • 130
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs