GustavoHCruz
/

ExInDNABERT2

sequence-classification

Model card Files Files and versions

GustavoHCruz commited on Nov 12, 2025

Commit

a8d4848

·

verified ·

1 Parent(s): 8360810

Update README.md

Files changed (1) hide show

README.md +58 -1

README.md CHANGED Viewed

@@ -2,4 +2,61 @@
 license: mit
 base_model:
 - zhihan1996/DNABERT-2-117M
----

 license: mit
 base_model:
 - zhihan1996/DNABERT-2-117M
+tags:
+- genomics
+- bioinformatics
+- DNA
+- sequence-classification
+- introns
+- exons
+- DNABERT2
+---
+# Exons and Introns Classifier
+BERT finetuned model for **classifying DNA sequences** into **introns** and **exons**, trained on a large cross-species GenBank dataset.
+## Architecture
+- Base model: DNABERT2
+- Approach: Full-sequence classification
+- Framework: PyTorch + Hugging Face Transformers
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained("GustavoHCruz/ExInDNABERT2")
+model = AutoModelForSequenceClassification.from_pretrained("GustavoHCruz/ExInDNABERT2")
+```
+Prompt format:
+The model expects nucleotide sequences.
+The model should predict the next token as the class label: 0 (Intron) or 1 (Exon).
+## Data
+The model was trained on a processed version of GenBank sequences spanning multiple species.
+## Publications
+- **Full Paper – 2nd Place (National)**
+  Achieved **2nd place** at the _Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2025)_, organized by the Brazilian Computer Society (SBC), held in Fortaleza, Ceará, Brazil.
+  [https://doi.org/10.5753/kdmile.2025.247575](https://doi.org/10.5753/kdmile.2025.247575)
+- **Short Paper (International)**
+  Presented at the _IEEE International Conference on Bioinformatics and BioEngineering (BIBE 2025)_, held in Athens, Greece.
+  [https://doi.org/10.1109/BIBE66822.2025.00113](https://doi.org/10.1109/BIBE66822.2025.00113)
+## Training
+- Trained on an architecture with 8x H100 GPUs.
+## GitHub Repository
+The full code for **data processing, model training, and inference** is available on GitHub:
+[CodingDNATransformers](https://github.com/GustavoHCruz/CodingDNATransformers)
+You can find scripts for:
+- Preprocessing GenBank sequences
+- Fine-tuning models
+- Evaluating and using the trained models