SpeechT5 TTS Hataw

This model is a fine-tuned version of microsoft/speecht5_tts on the HatawTTS dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 20
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 40
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 5000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.5981	0.0962	100	0.5262
0.4913	0.1925	200	0.4319
0.4812	0.2887	300	0.4450
0.4811	0.3850	400	0.4226
0.4584	0.4812	500	0.4040
0.4422	0.5775	600	0.4106
0.43	0.6737	700	0.3951
0.4325	0.7700	800	0.3884
0.4273	0.8662	900	0.3872
0.4133	0.9625	1000	0.3817
0.4162	1.0587	1100	0.3794
0.4181	1.1550	1200	0.3773
0.4044	1.2512	1300	0.3788
0.4061	1.3474	1400	0.3727
0.4122	1.4437	1500	0.3846
0.4075	1.5399	1600	0.3736
0.4069	1.6362	1700	0.3671
0.4036	1.7324	1800	0.3672
0.395	1.8287	1900	0.3667
0.3999	1.9249	2000	0.3775
0.3885	2.0212	2100	0.3651
0.4038	2.1174	2200	0.3667
0.3915	2.2137	2300	0.3598
0.3984	2.3099	2400	0.3587
0.3878	2.4062	2500	0.3587
0.3923	2.5024	2600	0.3579
0.4055	2.5987	2700	0.3567
0.3819	2.6949	2800	0.3554
0.3789	2.7911	2900	0.3522
0.3797	2.8874	3000	0.3522
0.3823	2.9836	3100	0.3513
0.3775	3.0799	3200	0.3508
0.3789	3.1761	3300	0.3495
0.376	3.2724	3400	0.3495
0.3774	3.3686	3500	0.3482
0.3739	3.4649	3600	0.3483
0.3718	3.5611	3700	0.3467
0.377	3.6574	3800	0.3484
0.3713	3.7536	3900	0.3444
0.3744	3.8499	4000	0.3461
0.3695	3.9461	4100	0.3440
0.3714	4.0423	4200	0.3428
0.3681	4.1386	4300	0.3424
0.3719	4.2348	4400	0.3424
0.3689	4.3311	4500	0.3411
0.3741	4.4273	4600	0.3413
0.3676	4.5236	4700	0.3402
0.3655	4.6198	4800	0.3402
0.369	4.7161	4900	0.3397
0.3624	4.8123	5000	0.3396

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(1307)

this model