mzr-chapter-audio-dataset-force-aligned-speecht5

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 4000
  • training_steps: 40000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.0626 12.5016 1000 0.0466
0.0517 25.0 2000 0.0436
0.0524 37.5016 3000 0.0428
0.0501 50.0 4000 0.0423
0.0464 62.5016 5000 0.0408
0.0422 75.0 6000 0.0421
0.0479 87.5016 7000 0.0416
0.0434 100.0 8000 0.0425
0.0421 112.5016 9000 0.0416
0.0408 125.0 10000 0.0424
0.0376 137.5016 11000 0.0438
0.0371 150.0 12000 0.0419
0.0377 162.5016 13000 0.0429
0.0377 175.0 14000 0.0422
0.0371 187.5016 15000 0.0427
0.0362 200.0 16000 0.0437
0.036 212.5016 17000 0.0438
0.0349 225.0 18000 0.0435
0.0356 237.5016 19000 0.0438
0.034 250.0 20000 0.0434
0.033 262.5016 21000 0.0437
0.0335 275.0 22000 0.0443
0.0329 287.5016 23000 0.0445
0.0332 300.0 24000 0.0448
0.0324 312.5016 25000 0.0449
0.0329 325.0 26000 0.0442
0.0317 337.5016 27000 0.0445
0.0311 350.0 28000 0.0443
0.0304 362.5016 29000 0.0448
0.0313 375.0 30000 0.0443
0.0308 387.5016 31000 0.0450
0.0312 400.0 32000 0.0447
0.0307 412.5016 33000 0.0448
0.0312 425.0 34000 0.0448
0.0304 437.5016 35000 0.0446
0.0313 450.0 36000 0.0448
0.0298 462.5016 37000 0.0446
0.0307 475.0 38000 0.0447
0.0302 487.5016 39000 0.0449
0.0303 500.0 40000 0.0448

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
733
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sil-ai/mzr-chapter-audio-dataset-force-aligned-speecht5

Finetuned
(1290)
this model

Evaluation results