NOTE: this model is recommended only for use cases that specifically require 3B parameters or below; the other versions of Esper 3.1 above are generally more performant, such as Ministral-3-14B-Reasoning-2512-Esper3.1.

Esper 3.1 is a coding, architecture, and DevOps reasoning specialist built on Qwen 3.

Your dedicated DevOps expert: Esper 3.1 maximizes DevOps and architecture helpfulness, powered by high-difficulty DevOps and architecture data generated with DeepSeek-V3.1-Terminus!
Improved coding performance: challenging code-reasoning datasets stretch DeepSeek-V3.1-Terminus and DeepSeek-V3.2-Exp to the limits, allowing Esper 3.1 to tackle harder coding tasks!
AI to build AI: our high-difficulty AI expertise data boosts Esper 3.1's MLOps, AI architecture, AI research, and general reasoning skills.
Small model sizes allow running on local desktop and mobile, plus super-fast server inference!

Prompting Guide

Esper 3.1 uses the Ministral-3-3B-Reasoning-2512 prompt format.

Example inference script to get started:

import torch
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend

model_id = "ValiantLabs/Ministral-3-3B-Reasoning-2512-Esper3.1"

tokenizer = MistralCommonBackend.from_pretrained(model_id)

model = Mistral3ForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

user_prompt = "The core learning mechanism in Soar, chunking, creates new production rules by compiling the results of successful subgoal resolution. Explain the precise mechanism by which the dependency graph of working memory elements that contributed to the subgoal's result determines the conditions of the new chunk. What are the implications of this mechanism for creating overly specific or overly general rules, and how can an architect guide the chunking process?"

system_prompt = (
    "# HOW YOU SHOULD THINK AND ANSWER\n\n"
    "First draft your thinking process (inner monologue) until you arrive at a response. "
    "Format your response using Markdown, and use LaTeX for any mathematical equations. "
    "Write both your thoughts and the response in the same language as the input.\n\n"
    "Your thinking process must follow the template below:"
    "[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. "
    "Be as casual and as long as you want until you are confident to generate the response to the user.[/THINK]"
    "Here, provide a self-contained response."
)

messages = [
    {
        "role": "system",
        "content": system_prompt
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": user_prompt,
            },
        ],
    },
]

tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True)
tokenized = {k: v.to("cuda") for k, v in tokenized.items() if hasattr(v, "to")}

output = model.generate(
    **tokenized,
    max_new_tokens=20000,
)[0]

decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
print(decoded_output)