πŸ₯– Baguettotron

Pleias

Blog announcement

Baguettotron is a 321 million parameters generalist Small Reasoning Model, trained on 200 billions tokens from SYNTH, a fully open generalist dataset.

Despite being trained on consideraly less data, Baguettotron outperforms most SLM of the same size range on non-code industry benchmarks, providing an unprecedented balance between memory, general reasoning, math and retrieval performance.

The name is both a nod to French origins and to the unusual shape of the model: with 80 layers, Baguettotron is currently the deepest SLM in its size range.

Features

Baguettron has been natively trained for instructions with thinking traces. We implemented a series of dedicated pipelines for:

  • Memorization of encyclopedic knowledge (50,000 vital articles from Wikipedia)
  • Retrieval-Augmented Generation with grounding (following on our initial experiments with Pleias-RAG series)
  • Arithmetic and simple math resolution problem
  • Editing tasks
  • Information extraction
  • Creative writing, including unusual synthetic exercises like lipograms or layout poems.
  • Cooking (the model wouldn't deserve its name otherwise)

Baguettotron is able to read and write in the main European languages: French, German, Italian, Spanish, Polish and, to a lesser extent Latin and Dutch. Reasoning traces are exclusively written in English.

Full synthetic training makes relatively straightforward to expand language support and we lookg forward to either bring more languages or create language-specific variants.

Model design and training

Baguettotron is a 321M parameters decoders with a standard Qwen/Llama-like design, except for extreme depth with 80 layers (a type of model we internally nicknamed "baguette")

Baguettotron was trained on 16 h100 from Jean Zay (compute plan nΒ°A0191016886). An unusual feature of training on SYNTH was having reasoning signals from MMLU and other major industry benchmarks very early on. We were able to empirically measure consistent improvements from stacking more layers.

Our current hypothesis is that deeper architecture benefits more from dense reasoning data, as the model is more commonly exposed to string sequences requiring intensive computation or knowledge interconnection.

Reasoning style

The reasoning traces use an entirely new reasoning style with dense, short frequently non-verbal sentences, designed by Pleias and made possible thanks to the use of fine-tuning models for synthetic generation.

Traces use the following stenographic notation integrated into the special tokens of the model:

Logical markers

Token Meaning Usage
β†’ derivation / implication For very short causal/logical flow
β†Ί iterative return / refinement loop For backtracking, reconsidering priors, RAG re-querying.
? uncertainty/questions to resolve Could be appended to short expressions/word, not just interrogative sentences
!/β€» insight/breakthroughs Emphatic mark for knowledge discovery
β‰ˆ approximation/estimates For intermediary hypothesis/uncertain preliminary statements
∴ therefore / final step Use sparingly to mark stable conclusions.

Uncertainty

Token Meaning Usage
● high confidence well-supported empirical/theoretical ground; β€œanchor points.”
◐ medium/partial confidence incomplete data; plausible but unverified links.
β—‹ low confidence speculation, missing context, weak inference chain.
⚠ bias/premise risk domain mismatch, cultural assumptions, language-switch artifacts.
?maybe? soft speculation marks tentative ideas, reasoning branches that might collapse later

Verification process

Token Meaning Usage
☐ unverified hypothesis raw claim, no cross-check yet.
β˜‘ intermediate verification one source/argument supports it.
βœ“ confirmed/validated multiple independent supports (●-level).

The model can also use a vareity of graphic notation for causality/problem decomposition at time. Things like:

Initial query:
β”œβ”€ feature1: *lorem ipsum*
β”œβ”€ feature2: *lorem ipsum*
└─ feature2: *lorem ipsum*

Simulated entropy

Baguettotron uses a range of special tokens ⟨Hβ‰ˆX.X⟩ to introduce higher entropy sequences, a bit similarly to temperature control.

  • ⟨Hβ‰ˆ0.3–0.5⟩: still grounded sequences with a slightly higher token entropy
  • ⟨Hβ‰ˆ0.5–1.0⟩: exploratory, multi-path reasoning
  • ⟨Hβ‰ˆ1.5–1.8⟩: fragmented, oniric, literary stream-of-consciousness drift

It remains a pure simulation since the model does obviously not have access to inference controls. Yet it still allows for more token exploration/diversification. Inspiration from this method came from the Entropix project.

Evaluation

We evaluated Baguettotron on three major industry benchmarks MMLU (general reasoning and memorization), math (gsm8k) and retrieval (HotPotQA). With only 321M parameters, Baguettotron gets close to Qwen-0.6B performance and significantly outperforms similarly sized Gemma.

Inference

Baguettotron has been trained on the standard instruction style from Qwen.

<|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
<think>

Baguettotron has support for multi-turn. We recommend to use a "rolling" thinking, by systematically appending thinking traces for each new generation but discarding the past one.

It's possible to remove thinking traces by swapping with a closing tag.

<|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
</think>

Yet, our current tests show a significantly decreased performance for most tasks, especially memorization of encyclopedic knowledge.

For RAG, Baguettotron uses a special syntax to pass on references:

<|im_start|>user
Who are you?

<source_1>[…]</source_1>
<source_2>[…]</source_2>
<|im_end|>
<|im_start|>assistant
<think>

Afterwards the model will return an answer with grounding references ([quote]). The draft will be affected as well and focus on source synthesis rather than reminiscence of internal knowledge base.

Fine-Tuning/RL

Baguettotron has been successfully fine-tuned for a variety of tasks including text classification and poetry writing.

Since it's a reasoning model, it should train well with reinforcement learning methods like GRPO, either for verifiable tasks or with a LLM-as-a-judge.

Downloads last month
10,277
Safetensors
Model size
0.3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 5 Ask for provider support

Model tree for PleIAs/Baguettotron

Finetunes
3 models
Quantizations
7 models

Space using PleIAs/Baguettotron 1

Collection including PleIAs/Baguettotron