TACLer-1.5B

We release TACLer-1.5B (🤗 HF Model), a hybrid reasoning model that supports both Thinking and NoThinking mode! We propose a model-tailored curriculum reinforcement learning framework that gradually increases the complexity of the data based on the model's proficiency in multi-stage RL training.

Our experiments show that: (i) TACLer reduces computational cost, cutting training compute by over 50% compared to long thinking models and reducing inference token usage by over 42% relative to the base model (DeepSeek-R1-Distill-Qwen-1.5B (R1-Qwen)); and (ii) TACLer improves accuracy by over 9% on the base model, consistently outperforming state-of-the-art Nothinking and Thinking baselines across four math datasets (MATH500, AMC, AIME 2024, and AIME 2025).

Code: https://github.com/laihuiyuan/tacler

Paper: https://arxiv.org/pdf/2601.21711

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "laihuiyuan/TACLer"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# using Thinking or Nothinking mode
think_mode = False

question = "How many positive whole-number divisors does 196 have?"
step_by_step = " Let's think step by step and output the final answer within \\boxed{}."
messages = [
    {"role": "user", "content": question + step_by_step}
]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
if not think_mode:
    prompt += 'Okay, I think I can solve it directly.\n</think>\n\n'
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
output = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")

print("**PROMPT**\n", prompt)
print("**OUTPUT**\n", output)

Citation

@article{lai-etal-2026-tacler,
    title = "TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning",
    author = "Lai, Huiyuan and Nissim, Malvina",
    journal={arXiv preprint arXiv:2601.21711},
    year={2026},
    url={https://arxiv.org/pdf/2601.21711}
}

Downloads last month: 20

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for laihuiyuan/TACLer

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Finetuned

(615)

this model

Dataset used to train laihuiyuan/TACLer

Paper for laihuiyuan/TACLer

TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning

Paper • 2601.21711 • Published 8 days ago