Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Paper
•
2301.02111
•
Published
•
1
Pre-trained checkpoint of Vall-E with AudioDec trained on WenetSpeech4TTS.
We provide three checkpoints trained/fine-tuned on different subset of WenetSpeech4TTS.
Inference code and more details : dukGuo/valle-audiodec.