--- license: apache-2.0 language: - multilingual - en - hi - es - fr - de - it - gu - mr pipeline_tag: automatic-speech-recognition tags: - nemo - asr - emotion - age - gender - intent - entity_recognition datasets: - MLCommons/peoples_speech - fsicoli/common_voice_17_0 - ai4bharat/IndicVoices - facebook/multilingual_librispeech - openslr/librispeech_asr base_model: - nvidia/parakeet-ctc-0.6b library_name: nemo --- # parakeet-ctc-0.6b-with-meta This is a multilingual Automatic Speech Recognition (ASR) model fine-tuned with NVIDIA NeMo. It is different from standard transcription models, as it can mark intents, get voice bio, and emotions in streaming. ## How to Use You can use this model directly with the NeMo toolkit for inference. ```python import nemo.collections.asr as nemo_asr # Load the model from Hugging Face Hub asr_model = nemo_asr.models.ASRModel.from_pretrained("WhissleAI/parakeet-ctc-0.6b-with-meta") # Transcribe an audio file transcriptions = asr_model.transcribe(["/path/to/your/audio.wav"]) print(transcriptions) ``` This model can also be used with the inference server provided in the `PromptingNemo` repository. See this folder for fine-tuning and inference scripts [https://github.com/WhissleAI/PromptingNemo/scripts/asr/meta-asr](https://github.com/WhissleAI/PromptingNemo/blob/main/scripts/asr/meta-asr) for details.