We release TACLer-1.5B (🤗 HF Model), a hybrid reasoning model that supports both Thinking and NoThinking mode! We propose a model-tailored curriculum reinforcement learning framework that gradually increases the complexity of the data based on the model's proficiency in multi-stage RL training.
Our experiments show that: (i) TACLer reduces computational cost, cutting training compute by over 50% compared to long thinking models and reducing inference token usage by over 42% relative to the base model (DeepSeek-R1-Distill-Qwen-1.5B (R1-Qwen)); and (ii) TACLer improves accuracy by over 9% on the base model, consistently outperforming state-of-the-art Nothinking and Thinking baselines across four math datasets (MATH500, AMC, AIME 2024, and AIME 2025).
Code: https://github.com/laihuiyuan/tacler
Paper: https://arxiv.org/pdf/2601.21711
Quickstart
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "laihuiyuan/TACLer"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# using Thinking or Nothinking mode
think_mode = False
question = "How many positive whole-number divisors does 196 have?"
step_by_step = " Let's think step by step and output the final answer within \\boxed{}."
messages = [
{"role": "user", "content": question + step_by_step}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
if not think_mode:
prompt += 'Okay, I think I can solve it directly.\n</think>\n\n'
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
output = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
print("**PROMPT**\n", prompt)
print("**OUTPUT**\n", output)
Citation
@article{lai-etal-2026-tacler,
title = "TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning",
author = "Lai, Huiyuan and Nissim, Malvina",
journal={arXiv preprint arXiv:2601.21711},
year={2026},
url={https://arxiv.org/pdf/2601.21711}
}
- Downloads last month
- 20
Model tree for laihuiyuan/TACLer
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B