🚀 ReTool Update — GRPO + Generation Reuse, Smarter Batching, and Beyond

bird-of-paradise · August 10, 2025, 9:35pm

Hey all,
Quick update on my ReTool project — a custom train loop for GRPO-style training with more efficient generation.

Key components of the train loop

1. Separated generation from update
No more tying gradient updates directly to when I generate completions. This gives more control over reuse and lets me structure training around steps_per_generation instead of being stuck with the 1:1 PPO pattern.

2. Generation reuse
Instead of

1 completion → 1 gradient update   # PPO style

We can do:

generate:   4 completions → store
train:      4 updates on stored generations

This drops generation cost without losing the group advantage that GRPO gives.

3. Mini & micro batching
Within each set of stored generations:

Mini-batch → processes multiple groups together for efficiency.
Micro-batch → splits further for gradient accumulation and memory safety.

This combo keeps GPU memory happy while still maintaining good throughput.

The training loop now

Check if it’s time to generate new completions.
If yes, run _generate_and_score_completions (with code execution where applicable) and store results.
Train on stored generations using _train_on_stored_generations (handles both mini & micro batching).
Log, monitor, and adapt LR via scheduler.

The batching logic is modular, so you can swap in your own GRPO/PPO/other loss function — the infrastructure still works.

I also wrote a Medium post with more details and a few debugging war stories.

If you’re building anything similar, I’d love to swap notes!

Topic		Replies	Views
GRPO trainer for old policy 🤗Transformers	0	52	February 19, 2025
Huggingface trl GRPO loss is always zero Beginners	5	900	May 18, 2025
Generating More Varied Trajectories During Inference for GRPO Beginners	2	59	November 22, 2025
[Data processing] How to design a training loop for custom data by GPT2 model Beginners	1	153	August 24, 2023
GRPO Trainer for VLM? Research	5	404	July 7, 2025

🚀 ReTool Update — GRPO + Generation Reuse, Smarter Batching, and Beyond

Key components of the train loop

The training loop now

Related topics