MLSpeech
/

CarelessWhisper-Streaming

@@ -128,9 +128,11 @@ model-index:
 # CarelessWhisper - Causal Whisper Streaming Model
 Causal Whisper Streaming is a fine tuned version of OpenAI Whisper, which can handle causal data and perform real-time transcription.
-![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)
-[![Demo on Hugging Face](https://img.shields.io/badge/🤗%20Demo-Hugging%20Face-blueviolet?logo=huggingface&logoColor=white)](https://huggingface.co/spaces/MLSpeech/CarelessWhisper-causal-streaming)
 ## 🔧 Setup
 We used Python 3.9.16, PyTorch 2.6.0, and PyTorch-Lightning 2.5.0 to train and test our models.
@@ -162,7 +164,7 @@ To set up the project environment using `conda`, follow these steps:
 After installing all of the dependencies, you can try to run inference.
-## Available Models
 We fine-tuned three different sizes of Whisper, all support english only transcription.
 A `large-v2` that was fine tuned on multilingual data is available, and supports English, French, Spanish, German and Portuguese with chunk size of 300 miliseconds.
@@ -173,13 +175,13 @@ A `large-v2` that was fine tuned on multilingual data is available, and supports
 |large-v2| 40, 100, 200, 300, 1000| 300   |
-## Running Inference
 To run inference, download the repo content, and run from the repository root accroding to following sections.
 > **Note:** The models are hosted on the [Hugging Face Hub](https://huggingface.co/), which requires an access token.
 > Make sure you are logged in with your token to access the models.
-### How to Apply Your Hugging Face Access Token
 1. **Create a Hugging Face account** (if you don’t have one) at [https://huggingface.co/join](https://huggingface.co/join).
@@ -199,7 +201,7 @@ To run inference, download the repo content, and run from the repository root ac
    Paste your token when prompted.
-### CLI Usage
 The transcription model is easily activated using the next command:
 ```bash
 # Using a local microphone for streaming transcription, dumping the recording to out.wav
@@ -227,7 +229,7 @@ python transcribe.py \
 --use_latency
 ```
-### Python Usage
 If you prefer using python, a code sinppet utilizing a microphone or a wav file is provided below:
 ```python
@@ -257,10 +259,10 @@ texts_wav_simulation = model.transcribe(simulate_stream=True,
                                         ca_kv_cache=True)
 ```
-## Training
 In order to train using LoRA, you can use our existing code. Make sure all the requirements are installed.
-### Dataset Structure
 Before starting model training using the command-line interface provided below, you must first configure your dataset dictionary file located at `training_code/ds_dict.py`.
@@ -274,7 +276,7 @@ This file defines a Python dictionary named `ds_paths`, where you should specify
 You can find an example entry in `training_code/ds_dict.py`.
-### CLI Interface
 ```bash
 python training_code/train.py \
 --lora \
@@ -297,9 +299,18 @@ For more options and training configurations, run:
 ```bash
 python training_code/train.py --help
 ```
-## 🙏 Acknowledgements
-This project uses components from [OpenAI's Whisper](https://github.com/openai/whisper), licensed under the MIT License.

 # CarelessWhisper - Causal Whisper Streaming Model
 Causal Whisper Streaming is a fine tuned version of OpenAI Whisper, which can handle causal data and perform real-time transcription.
+[![arXiv](https://img.shields.io/badge/arXiv-2508.12301-b31b1b.svg)](https://arxiv.org/abs/2508.12301)  [![Demo on Hugging Face](https://img.shields.io/badge/🤗%20Demo-Hugging%20Face-blueviolet?logo=huggingface&logoColor=white)](https://huggingface.co/spaces/MLSpeech/CarelessWhisper-causal-streaming)
+## 📄 Paper
+For more details, see our [paper](https://arxiv.org/abs/2508.12301).
 ## 🔧 Setup
 We used Python 3.9.16, PyTorch 2.6.0, and PyTorch-Lightning 2.5.0 to train and test our models.
 After installing all of the dependencies, you can try to run inference.
+## 🤖 Available Models
 We fine-tuned three different sizes of Whisper, all support english only transcription.
 A `large-v2` that was fine tuned on multilingual data is available, and supports English, French, Spanish, German and Portuguese with chunk size of 300 miliseconds.
 |large-v2| 40, 100, 200, 300, 1000| 300   |
+## 🎤 Running Inference
 To run inference, download the repo content, and run from the repository root accroding to following sections.
 > **Note:** The models are hosted on the [Hugging Face Hub](https://huggingface.co/), which requires an access token.
 > Make sure you are logged in with your token to access the models.
+### How to Apply Your Hugging Face 🤗 Access Token
 1. **Create a Hugging Face account** (if you don’t have one) at [https://huggingface.co/join](https://huggingface.co/join).
    Paste your token when prompted.
+### 🖥️ CLI Usage
 The transcription model is easily activated using the next command:
 ```bash
 # Using a local microphone for streaming transcription, dumping the recording to out.wav
 --use_latency
 ```
+### 🐍 Python Usage
 If you prefer using python, a code sinppet utilizing a microphone or a wav file is provided below:
 ```python
                                         ca_kv_cache=True)
 ```
+## 🦾 Training
 In order to train using LoRA, you can use our existing code. Make sure all the requirements are installed.
+### 📂 Dataset Structure
 Before starting model training using the command-line interface provided below, you must first configure your dataset dictionary file located at `training_code/ds_dict.py`.
 You can find an example entry in `training_code/ds_dict.py`.
+### 🖥️ CLI Interface
 ```bash
 python training_code/train.py \
 --lora \
 ```bash
 python training_code/train.py --help
 ```
+## 📜 License
+This repository uses a dual license:
+[![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
+Portions derived from [OpenAI Whisper](https://github.com/openai/whisper) are licensed under the **MIT License**.
+[![CC BY-NC 4.0 License](https://img.shields.io/badge/License-CC--BY--NC%204.0-blue.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
+All other original code in this repository is licensed under the **Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0)**.
+See the [LICENSE](./LICENSE) file for full details.