Hi Friends
Since installing a real-time translator with live subtitles in OBS, I heard about Whisper models for the first time and learned what they can do.
Since I am a beginner, it would be best if I also explain why I am doing this.
And what I have already tried to solve the problem.
My Whisper model does not transcribe what I actually say with 100% accuracy. I suspect this is because I may not be speaking clearly, but I am sure that no one does. To speak clearly, I would have to read from a dictionary.
I also know that there are other Whisper models for such problems. Small, Medium, Large, and Turbo. I’ve tried every model, but the fastest models are Tiny, Small, and Base. All the others are too slow.
I should also mention that I stream live and play on my PC at the same time.
I also know that there are other Whisper models for such problems. Small, Medium, Large, and Turbo. I have tried every model, but the fastest models are Tiny, Small, and Base. All the others are too slow. I should also mention that I stream live and play on my PC at the same time. I assume that only the basics of the German language have been trained.
The only one trying to help me is the AI Copilot on GitHub. I feel like I’m making the whole thing much more complicated than it actually is. Copilot also wants to provide me with everything I need for fine-tuning. The only thing I had to take care of myself was obtaining audio files with the corresponding transcription. Copilot recommended websites where I could get the audio files.
Since the model is supposed to understand German pronunciation better, I thought it would be useful to train the model with audio files (see image).
I downloaded the audio files from this website: Common Voice
Can you please explain to a beginner the easiest way to train a Whisper model?
THX🙂
(with DeepL translation)
