Manuk MarianMT Translator (Indonesian β Geser)
Model Description
This MarianMT-based model was trained for the Manuk project, focusing on the revitalisation of the Geser language (Eastern Seram).
It is fine-tuned on a custom parallel corpus containing 7,327 aligned sentences across Geser, Indonesian, and English.
Evaluation
The model was evaluated on the Indonesian β Geser direction, achieving the following results:
| Direction | Val BLEU | Test BLEU | Val Loss | Train Loss |
|---|---|---|---|---|
| Indonesian β Geser | 26.08 | 26.36 | 0.17 | 0.21 |
These results demonstrate that the model produces reasonable translations for a low-resource language pair.
Sample Corpus (Parallel Translations)
| Domain | Geser | Indonesian |
|---|---|---|
| Traditional Medicine | akar dirang mera ira, kalu me mancia dageit le ikea ima di nabagadik, dafaik dani akara baru datutu dabobar nai ikea ima di nabagadik ira. | akar serai merah digunakan ketika seseorang jatuh dan mengalami patah pada kaki atau tangannya. akar tersebut diambil, lalu ditumbuk dan dibungkuskan pada bagian tubuh yang patah. |
| Family & Livelihood | dodani, nugu abang tura nugu baba dasubelat daroka ikan wekan loka. moale, dodi datanak lau pasar ababis loka, jadi bot naresi oaca mo. | tadi malam, abang dan ayah saya pergi memancing dan berhasil mendapatkan banyak ikan. akan tetapi, mereka menjual semuanya ke pasar hingga tersisa sedikit saja. |
Requirements
This model was tested with Python 3.11.
To use AiRukua/Indo-to-Geser, you need to install the following dependencies:
Minimal Installation
pip install torch sentencepiece sacremoses transformers
Recommended Installation (for Python 3.11 with CUDA support)
Check your CUDA version:
nvidia-smiInstall PyTorch with the matching CUDA toolkit. Example (CUDA 12.1):
pip install torch --index-url https://download.pytorch.org/whl/cu121For CPU only:
pip install torch --index-url https://download.pytorch.org/whl/cpuInstall the remaining dependencies:
pip install sentencepiece sacremoses transformers
Usage
Download and Load Model
from transformers import MarianMTModel, MarianTokenizer
model_name = "AiRukua/Indo-to-Geser"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
Translation Functions
import torch
def translate(text, model_name="AiRukua/Indo-to-Geser", max_len=128, num_beams=4):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name).to(device)
inputs = tokenizer(
text,
return_tensors="pt",
padding=True,
truncation=True,
max_length=max_len,
).to(device)
with torch.no_grad():
outputs = model.generate(
**inputs, max_length=max_len, num_beams=num_beams
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
def translate_dialogue(dialogue: str, model_name="AiRukua/Indo-to-Geser"):
lines = dialogue.strip().split("\n")
translated_lines = []
for line in lines:
if not line.strip():
continue
if ":" in line:
speaker, text = line.split(":", 1)
translated_text = translate(text.strip(), model_name)
translated_lines.append(f"{speaker}: {translated_text}")
else:
translated_text = translate(line.strip(), model_name)
translated_lines.append(translated_text)
return "\n".join(translated_lines)
# Example usage
dialogue = """
Alice: Apa kabar hari ini?
Bob: Saya baik, terima kasih.
"""
print(translate_dialogue(dialogue))
Intended Use
- Translation between Indonesian β Geser
- Research and education on endangered language technology
- Community-driven language revitalisation projects
- Downloads last month
- 14