metadata
language: kik
tags:
- automatic-speech-recognition
- w2v-bert-2.0
- kikuyu
- low-resource
- adapter
- peft
license: apache-2.0
base_model: facebook/w2v-bert-2.0
datasets:
- mutisya/Kikuyu_asr_v24_23_1-filtered
metrics:
- wer
model-index:
- name: w2v-bert-hybrid-v3-kikuyu-asr
results:
- task:
type: automatic-speech-recognition
name: Speech Recognition
dataset:
name: Kikuyu ASR
type: mutisya/Kikuyu_asr_v24_23_1-filtered
metrics:
- type: wer
value: 20.3
name: WER
W2V-BERT 2.0 Hybrid V3 Kikuyu ASR
This model is a fine-tuned version of facebook/w2v-bert-2.0 for Kikuyu (Gĩkũyũ) automatic speech recognition.
Model Description
This model uses a Hybrid V3 architecture that combines:
- 24 MMS-style bottleneck adapters (64-dim) in each transformer layer
- Single-layer transformer decoder with pre+post normalization
- Gated residual connections for stable training
Architecture Details
- Base Model: facebook/w2v-bert-2.0 (580M parameters)
- Trainable Parameters: 11,660,835 (1.97% of total)
- Adapter Dimension: 64
- Decoder Hidden Size: 1024 (matches W2V-BERT)
- Decoder FFN Size: 2048
Training Details
- Training Samples: 5,000
- Epochs: 20
- Learning Rate: 0.0003
- Batch Size: 4 (effective: 16 with gradient accumulation)
- Warmup Steps: 500
- Optimizer: AdamW with cosine LR schedule
Performance
| Metric | Value |
|---|---|
| Word Error Rate (WER) | 20.30% |
| Eval Loss | 0.2371 |
| Train Loss | 0.3413 |
Usage
Limitations
- Trained specifically for Kikuyu language
- Best performance on clean, clear audio
- May struggle with heavy background noise or very fast speech
Citation
If you use this model, please cite:
License
Apache 2.0