File size: 1,945 Bytes
cae784d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
language: kik
tags:
  - automatic-speech-recognition
  - w2v-bert-2.0
  - kikuyu
  - low-resource
  - adapter
  - peft
license: apache-2.0
base_model: facebook/w2v-bert-2.0
datasets:
  - mutisya/Kikuyu_asr_v24_23_1-filtered
metrics:
  - wer
model-index:
  - name: w2v-bert-hybrid-v3-kikuyu-asr
    results:
      - task:
          type: automatic-speech-recognition
          name: Speech Recognition
        dataset:
          name: Kikuyu ASR
          type: mutisya/Kikuyu_asr_v24_23_1-filtered
        metrics:
          - type: wer
            value: 20.30
            name: WER
---

# W2V-BERT 2.0 Hybrid V3 Kikuyu ASR

This model is a fine-tuned version of [facebook/w2v-bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0) for Kikuyu (Gĩkũyũ) automatic speech recognition.

## Model Description

This model uses a **Hybrid V3 architecture** that combines:
- **24 MMS-style bottleneck adapters** (64-dim) in each transformer layer
- **Single-layer transformer decoder** with pre+post normalization
- **Gated residual connections** for stable training

### Architecture Details

- **Base Model**: facebook/w2v-bert-2.0 (580M parameters)
- **Trainable Parameters**: 11,660,835 (1.97% of total)
- **Adapter Dimension**: 64
- **Decoder Hidden Size**: 1024 (matches W2V-BERT)
- **Decoder FFN Size**: 2048

## Training Details

- **Training Samples**: 5,000
- **Epochs**: 20
- **Learning Rate**: 0.0003
- **Batch Size**: 4 (effective: 16 with gradient accumulation)
- **Warmup Steps**: 500
- **Optimizer**: AdamW with cosine LR schedule

## Performance

| Metric | Value |
|--------|-------|
| **Word Error Rate (WER)** | **20.30%** |
| Eval Loss | 0.2371 |
| Train Loss | 0.3413 |

## Usage



## Limitations

- Trained specifically for Kikuyu language
- Best performance on clean, clear audio
- May struggle with heavy background noise or very fast speech

## Citation

If you use this model, please cite:



## License

Apache 2.0