Esper 3.1
Collection
Esper 3.1 is a DevOps, architecture, code, and general reasoning finetune for Qwen, Ministral and gpt-oss!
•
5 items
•
Updated
•
1
Support our open-source dataset and model releases!
Esper 3.1: Ministral-3-3B-Reasoning-2512, Qwen3-4B-Thinking-2507, Ministral-3-8B-Reasoning-2512, Ministral-3-14B-Reasoning-2512, gpt-oss-20b
NOTE: this model is recommended only for use cases that specifically require 3B parameters or below; the other versions of Esper 3.1 above are generally more performant, such as Ministral-3-14B-Reasoning-2512-Esper3.1.
Esper 3.1 is a coding, architecture, and DevOps reasoning specialist built on Qwen 3.
Esper 3.1 uses the Ministral-3-3B-Reasoning-2512 prompt format.
Example inference script to get started:
import torch
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend
model_id = "ValiantLabs/Ministral-3-3B-Reasoning-2512-Esper3.1"
tokenizer = MistralCommonBackend.from_pretrained(model_id)
model = Mistral3ForConditionalGeneration.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
user_prompt = "The core learning mechanism in Soar, chunking, creates new production rules by compiling the results of successful subgoal resolution. Explain the precise mechanism by which the dependency graph of working memory elements that contributed to the subgoal's result determines the conditions of the new chunk. What are the implications of this mechanism for creating overly specific or overly general rules, and how can an architect guide the chunking process?"
system_prompt = (
"# HOW YOU SHOULD THINK AND ANSWER\n\n"
"First draft your thinking process (inner monologue) until you arrive at a response. "
"Format your response using Markdown, and use LaTeX for any mathematical equations. "
"Write both your thoughts and the response in the same language as the input.\n\n"
"Your thinking process must follow the template below:"
"[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. "
"Be as casual and as long as you want until you are confident to generate the response to the user.[/THINK]"
"Here, provide a self-contained response."
)
messages = [
{
"role": "system",
"content": system_prompt
},
{
"role": "user",
"content": [
{
"type": "text",
"text": user_prompt,
},
],
},
]
tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True)
tokenized = {k: v.to("cuda") for k, v in tokenized.items() if hasattr(v, "to")}
output = model.generate(
**tokenized,
max_new_tokens=20000,
)[0]
decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
print(decoded_output)
Esper 3.1 is created by Valiant Labs.
Check out our HuggingFace page to see all of our models!
We care about open source. For everyone to use.