mutisya's picture
Upload folder using huggingface_hub
cae784d verified
metadata
language: kik
tags:
  - automatic-speech-recognition
  - w2v-bert-2.0
  - kikuyu
  - low-resource
  - adapter
  - peft
license: apache-2.0
base_model: facebook/w2v-bert-2.0
datasets:
  - mutisya/Kikuyu_asr_v24_23_1-filtered
metrics:
  - wer
model-index:
  - name: w2v-bert-hybrid-v3-kikuyu-asr
    results:
      - task:
          type: automatic-speech-recognition
          name: Speech Recognition
        dataset:
          name: Kikuyu ASR
          type: mutisya/Kikuyu_asr_v24_23_1-filtered
        metrics:
          - type: wer
            value: 20.3
            name: WER

W2V-BERT 2.0 Hybrid V3 Kikuyu ASR

This model is a fine-tuned version of facebook/w2v-bert-2.0 for Kikuyu (Gĩkũyũ) automatic speech recognition.

Model Description

This model uses a Hybrid V3 architecture that combines:

  • 24 MMS-style bottleneck adapters (64-dim) in each transformer layer
  • Single-layer transformer decoder with pre+post normalization
  • Gated residual connections for stable training

Architecture Details

  • Base Model: facebook/w2v-bert-2.0 (580M parameters)
  • Trainable Parameters: 11,660,835 (1.97% of total)
  • Adapter Dimension: 64
  • Decoder Hidden Size: 1024 (matches W2V-BERT)
  • Decoder FFN Size: 2048

Training Details

  • Training Samples: 5,000
  • Epochs: 20
  • Learning Rate: 0.0003
  • Batch Size: 4 (effective: 16 with gradient accumulation)
  • Warmup Steps: 500
  • Optimizer: AdamW with cosine LR schedule

Performance

Metric Value
Word Error Rate (WER) 20.30%
Eval Loss 0.2371
Train Loss 0.3413

Usage

Limitations

  • Trained specifically for Kikuyu language
  • Best performance on clean, clear audio
  • May struggle with heavy background noise or very fast speech

Citation

If you use this model, please cite:

License

Apache 2.0