PlasmidGPT-RL

This model is a fine-tuned version of UCL-CSSB/PlasmidGPT-SFT using Group Relative Policy Optimization (GRPO).

Model Description

PlasmidGPT-RL is trained to generate functional plasmid DNA sequences. It was fine-tuned using reinforcement learning with a reward model that evaluates:

  • Presence of valid origins of replication (OriV)
  • Presence of antibiotic resistance genes (ARGs)
  • Absence of problematic repeat sequences

Training

This model was trained with GRPO using the TRL library.

Training run: Weights & Biases

Training Details

  • Base model: UCL-CSSB/PlasmidGPT-SFT
  • Method: GRPO (Group Relative Policy Optimization)
  • Checkpoint: 800 steps

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("McClain/PlasmidGPT-RL")
model = AutoModelForCausalLM.from_pretrained("McClain/PlasmidGPT-RL")

# Generate a plasmid sequence
prompt = "ATG"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.95,
    top_p=0.9
)
sequence = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(sequence)

Framework Versions

  • TRL: 0.23.1
  • Transformers: 4.57.0
  • PyTorch: 2.8.0

Citation

If you use this model, please cite the GRPO paper:

@article{shao2024deepseekmath,
    title={{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author={Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year={2024},
    eprint={arXiv:2402.03300},
}
Downloads last month
315
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for McClain/PlasmidGPT-RL

Finetuned
(1)
this model

Paper for McClain/PlasmidGPT-RL