Evo 2 (1B Base) - Hugging Face Transformers Format

This repository contains the Evo 2 (1B Base) model, converted to the Hugging Face Transformers format.

Original Repository: arcinstitute/evo2_1b_base
Paper: Genome modeling and design across all domains of life with Evo 2
Authors: Garyk Brixi, Matthew G. Durrant, Jerome Ku, Michael Poli, et al.

Model Description

Evo 2 is a biological foundation model trained on 9.3 trillion DNA base pairs from a curated genomic atlas spanning all domains of life. It uses the StripedHyena architecture to process long sequences (up to 1 million base pairs) at nucleotide-level resolution. This model is designed for tasks such as predicting the functional effects of mutations and generating novel genomic sequences.

This version has been converted to be compatible with the transformers library, allowing for easy loading and inference.

Usage

You can load and run this model using the transformers library as follows:

import torch
from transformers import Evo2ForCausalLM, Evo2Tokenizer

# Replace with your local path or the Hub repo ID after uploading
model_path = "path/to/this/repo" 

print(f"Loading model from {model_path}...")
model = Evo2ForCausalLM.from_pretrained(model_path)
tokenizer = Evo2Tokenizer.from_pretrained(model_path)

# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Input sequence (DNA)
sequence = "ACGTACGT"
print(f"Input: {sequence}")

# Tokenize
input_ids = tokenizer.encode(sequence, return_tensors="pt").to(device)

# Generate
print("Generating...")
with torch.no_grad():
    output = model.generate(input_ids, max_new_tokens=20)

# Decode
generated_sequence = tokenizer.decode(output[0])
print(f"Output: {generated_sequence}")

Citation

If you use this model, please cite the original paper:

@article{brixi2024genome,
  title={Genome modeling and design across all domains of life with Evo 2},
  author={Brixi, Garyk and Durrant, Matthew G and Ku, Jerome and Poli, Michael and others},
  journal={bioRxiv},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}
Downloads last month
22
Safetensors
Model size
1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support