base_sami_22kft_ft_pseudo_widv_ep60_tde

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0285
Wer: 0.0215
Cer: 0.0044

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.25
num_epochs: 60.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.1722	1.0	972	0.0286	0.0213	0.0043
0.1455	2.0	1944	0.0484	0.0384	0.0077
0.1681	3.0	2916	0.0656	0.0715	0.0145
0.192	4.0	3888	0.1178	0.1105	0.0234
0.2199	5.0	4860	0.1491	0.1904	0.0426
0.2408	6.0	5832	0.1710	0.2054	0.0504
0.2761	7.0	6804	0.2405	0.2398	0.0607
0.3111	8.0	7776	0.2560	0.3894	0.0969
0.3474	9.0	8748	0.3227	0.4196	0.1234
0.3899	10.0	9720	0.3253	0.4059	0.1138
0.4153	11.0	10692	0.3555	0.4881	0.1411
0.4633	12.0	11664	0.4051	0.5148	0.1554
0.5276	13.0	12636	0.4871	0.5114	0.1601
0.5434	14.0	13608	0.5272	0.6465	0.2077
0.555	15.0	14580	0.5462	0.6470	0.2203
0.5475	16.0	15552	0.4817	0.5892	0.1855
0.5442	17.0	16524	0.4538	0.5851	0.1835
0.8933	18.0	17496	0.4262	0.5167	0.1676
0.5099	19.0	18468	0.3809	0.4778	0.1485
0.5529	20.0	19440	0.3897	0.4954	0.1511
0.4446	21.0	20412	0.3875	0.4507	0.1365
0.4376	22.0	21384	0.3487	0.4592	0.1403
0.4044	23.0	22356	0.3340	0.4146	0.1234
0.4067	24.0	23328	0.3254	0.4069	0.1223
0.3696	25.0	24300	0.3158	0.3938	0.1161
0.3417	26.0	25272	0.3046	0.3981	0.1148
0.3381	27.0	26244	0.2911	0.3753	0.1105
0.3187	28.0	27216	0.3069	0.3865	0.1190
0.3016	29.0	28188	0.2523	0.3525	0.1006
0.2837	30.0	29160	0.2433	0.3294	0.0959
0.2622	31.0	30132	0.2359	0.3172	0.0908
0.2584	32.0	31104	0.2495	0.3269	0.0963
0.239	33.0	32076	0.2421	0.3188	0.0922
0.2275	34.0	33048	0.2253	0.3124	0.0917
0.2216	35.0	34020	0.2188	0.2876	0.0828
0.2078	36.0	34992	0.2263	0.2967	0.0828
0.1993	37.0	35964	0.2169	0.2873	0.0824
0.1846	38.0	36936	0.2136	0.2735	0.0778
0.1797	39.0	37908	0.2095	0.2625	0.0762
0.182	40.0	38880	0.2157	0.2581	0.0742
0.1633	41.0	39852	0.1872	0.2526	0.0726
0.1674	42.0	40824	0.1886	0.2471	0.0717
0.1598	43.0	41796	0.2110	0.2573	0.0729
0.1485	44.0	42768	0.2113	0.2413	0.0698
0.143	45.0	43740	0.1945	0.2335	0.0689
0.1297	46.0	44712	0.1945	0.2222	0.0640
0.1243	47.0	45684	0.1878	0.2217	0.0628
0.1153	48.0	46656	0.1935	0.2165	0.0614
0.1115	49.0	47628	0.1976	0.2115	0.0604
0.1012	50.0	48600	0.1816	0.2102	0.0601
0.1002	51.0	49572	0.1782	0.2041	0.0582
0.0917	52.0	50544	0.2061	0.2015	0.0588
0.086	53.0	51516	0.1777	0.1966	0.0579
0.0769	54.0	52488	0.1872	0.1889	0.0550
0.0765	55.0	53460	0.1923	0.1880	0.0539
0.0673	56.0	54432	0.1879	0.1820	0.0535
0.0646	57.0	55404	0.2011	0.1820	0.0534
0.0636	58.0	56376	0.1952	0.1800	0.0528
0.0618	59.0	57348	0.2021	0.1775	0.0520
0.0557	60.0	58320	0.1996	0.1745	0.0512

Framework versions

Transformers 4.48.3
Pytorch 2.5.1
Datasets 3.2.0
Tokenizers 0.21.0

Downloads last month: 5

Safetensors

Model size

94.4M params

Tensor type

F32

Priyanship
/

base_sami_22kft_ft_pseudo_widv_ep60_tde

base_sami_22kft_ft_pseudo_widv_ep60_tde

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results