TMLR-Group-HF/Self-Certainty-Qwen3-8B-Base-DAPO14k Text Generation • 8B • Updated Oct 11, 2025 • 2 • 1
TMLR-Group-HF/Self-Certainty-Qwen3-4B-Base-DAPO14k Text Generation • 4B • Updated Oct 11, 2025 • 1
TMLR-Group-HF/Entropy-Llama-3.2-3B-Instruct-DAPO14k Text Generation • 4B • Updated Oct 11, 2025 • 1
TMLR-Group-HF/Majority-Voting-Qwen3-8B-Base-DAPO14k Text Generation • 8B • Updated Oct 11, 2025 • 2 • 2
TMLR-Group-HF/Co-rewarding-I-Qwen3-8B-Base-OpenRS Text Generation • 8B • Updated Oct 11, 2025 • 2 • 1
TMLR-Group-HF/Majority-Voting-Llama-3.2-3B-Instruct-DAPO14k Text Generation • 4B • Updated Oct 11, 2025 • 3
TMLR-Group-HF/Self-Certainty-Llama-3.2-3B-Instruct-DAPO14k Text Generation • 4B • Updated Oct 11, 2025
TMLR-Group-HF/Entropy-Llama-3.2-3B-Instruct-MATH Text Generation • 4B • Updated Oct 11, 2025 • 2
TMLR-Group-HF/Co-rewarding-I-Qwen3-4B-Base-OpenRS Text Generation • 4B • Updated Oct 11, 2025 • 2
TMLR-Group-HF/Self-Certainty-Qwen3-4B-Base-OpenRS Text Generation • 4B • Updated Oct 11, 2025 • 2
TMLR-Group-HF/Self-Certainty-Qwen3-8B-Base-OpenRS Text Generation • 8B • Updated Oct 11, 2025 • 1
TMLR-Group-HF/Co-rewarding-I-Qwen3-8B-Base-DAPO14k Text Generation • 8B • Updated Oct 11, 2025 • 1
TMLR-Group-HF/Majority-Voting-Qwen3-8B-Base-OpenRS Text Generation • 8B • Updated Oct 11, 2025 • 1 • 1