Update README.md
Browse files
README.md
CHANGED
|
@@ -128,9 +128,11 @@ model-index:
|
|
| 128 |
# CarelessWhisper - Causal Whisper Streaming Model
|
| 129 |
Causal Whisper Streaming is a fine tuned version of OpenAI Whisper, which can handle causal data and perform real-time transcription.
|
| 130 |
|
| 131 |
-
](https://huggingface.co/spaces/MLSpeech/CarelessWhisper-causal-streaming)
|
| 133 |
|
|
|
|
|
|
|
|
|
|
| 134 |
|
| 135 |
## π§ Setup
|
| 136 |
We used Python 3.9.16, PyTorch 2.6.0, and PyTorch-Lightning 2.5.0 to train and test our models.
|
|
@@ -162,7 +164,7 @@ To set up the project environment using `conda`, follow these steps:
|
|
| 162 |
|
| 163 |
After installing all of the dependencies, you can try to run inference.
|
| 164 |
|
| 165 |
-
## Available Models
|
| 166 |
We fine-tuned three different sizes of Whisper, all support english only transcription.
|
| 167 |
A `large-v2` that was fine tuned on multilingual data is available, and supports English, French, Spanish, German and Portuguese with chunk size of 300 miliseconds.
|
| 168 |
|
|
@@ -173,13 +175,13 @@ A `large-v2` that was fine tuned on multilingual data is available, and supports
|
|
| 173 |
|large-v2| 40, 100, 200, 300, 1000| 300 |
|
| 174 |
|
| 175 |
|
| 176 |
-
## Running Inference
|
| 177 |
To run inference, download the repo content, and run from the repository root accroding to following sections.
|
| 178 |
|
| 179 |
> **Note:** The models are hosted on the [Hugging Face Hub](https://huggingface.co/), which requires an access token.
|
| 180 |
> Make sure you are logged in with your token to access the models.
|
| 181 |
|
| 182 |
-
### How to Apply Your Hugging Face Access Token
|
| 183 |
|
| 184 |
1. **Create a Hugging Face account** (if you donβt have one) at [https://huggingface.co/join](https://huggingface.co/join).
|
| 185 |
|
|
@@ -199,7 +201,7 @@ To run inference, download the repo content, and run from the repository root ac
|
|
| 199 |
Paste your token when prompted.
|
| 200 |
|
| 201 |
|
| 202 |
-
### CLI Usage
|
| 203 |
The transcription model is easily activated using the next command:
|
| 204 |
```bash
|
| 205 |
# Using a local microphone for streaming transcription, dumping the recording to out.wav
|
|
@@ -227,7 +229,7 @@ python transcribe.py \
|
|
| 227 |
--use_latency
|
| 228 |
```
|
| 229 |
|
| 230 |
-
### Python Usage
|
| 231 |
If you prefer using python, a code sinppet utilizing a microphone or a wav file is provided below:
|
| 232 |
|
| 233 |
```python
|
|
@@ -257,10 +259,10 @@ texts_wav_simulation = model.transcribe(simulate_stream=True,
|
|
| 257 |
ca_kv_cache=True)
|
| 258 |
```
|
| 259 |
|
| 260 |
-
## Training
|
| 261 |
In order to train using LoRA, you can use our existing code. Make sure all the requirements are installed.
|
| 262 |
|
| 263 |
-
### Dataset Structure
|
| 264 |
|
| 265 |
Before starting model training using the command-line interface provided below, you must first configure your dataset dictionary file located at `training_code/ds_dict.py`.
|
| 266 |
|
|
@@ -274,7 +276,7 @@ This file defines a Python dictionary named `ds_paths`, where you should specify
|
|
| 274 |
|
| 275 |
You can find an example entry in `training_code/ds_dict.py`.
|
| 276 |
|
| 277 |
-
### CLI Interface
|
| 278 |
```bash
|
| 279 |
python training_code/train.py \
|
| 280 |
--lora \
|
|
@@ -297,9 +299,18 @@ For more options and training configurations, run:
|
|
| 297 |
```bash
|
| 298 |
python training_code/train.py --help
|
| 299 |
```
|
| 300 |
-
## π Acknowledgements
|
| 301 |
|
| 302 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 303 |
|
| 304 |
|
| 305 |
|
|
|
|
| 128 |
# CarelessWhisper - Causal Whisper Streaming Model
|
| 129 |
Causal Whisper Streaming is a fine tuned version of OpenAI Whisper, which can handle causal data and perform real-time transcription.
|
| 130 |
|
| 131 |
+
[](https://arxiv.org/abs/2508.12301) [](https://huggingface.co/spaces/MLSpeech/CarelessWhisper-causal-streaming)
|
|
|
|
| 132 |
|
| 133 |
+
## π Paper
|
| 134 |
+
|
| 135 |
+
For more details, see our [paper](https://arxiv.org/abs/2508.12301).
|
| 136 |
|
| 137 |
## π§ Setup
|
| 138 |
We used Python 3.9.16, PyTorch 2.6.0, and PyTorch-Lightning 2.5.0 to train and test our models.
|
|
|
|
| 164 |
|
| 165 |
After installing all of the dependencies, you can try to run inference.
|
| 166 |
|
| 167 |
+
## π€ Available Models
|
| 168 |
We fine-tuned three different sizes of Whisper, all support english only transcription.
|
| 169 |
A `large-v2` that was fine tuned on multilingual data is available, and supports English, French, Spanish, German and Portuguese with chunk size of 300 miliseconds.
|
| 170 |
|
|
|
|
| 175 |
|large-v2| 40, 100, 200, 300, 1000| 300 |
|
| 176 |
|
| 177 |
|
| 178 |
+
## π€ Running Inference
|
| 179 |
To run inference, download the repo content, and run from the repository root accroding to following sections.
|
| 180 |
|
| 181 |
> **Note:** The models are hosted on the [Hugging Face Hub](https://huggingface.co/), which requires an access token.
|
| 182 |
> Make sure you are logged in with your token to access the models.
|
| 183 |
|
| 184 |
+
### How to Apply Your Hugging Face π€ Access Token
|
| 185 |
|
| 186 |
1. **Create a Hugging Face account** (if you donβt have one) at [https://huggingface.co/join](https://huggingface.co/join).
|
| 187 |
|
|
|
|
| 201 |
Paste your token when prompted.
|
| 202 |
|
| 203 |
|
| 204 |
+
### π₯οΈ CLI Usage
|
| 205 |
The transcription model is easily activated using the next command:
|
| 206 |
```bash
|
| 207 |
# Using a local microphone for streaming transcription, dumping the recording to out.wav
|
|
|
|
| 229 |
--use_latency
|
| 230 |
```
|
| 231 |
|
| 232 |
+
### π Python Usage
|
| 233 |
If you prefer using python, a code sinppet utilizing a microphone or a wav file is provided below:
|
| 234 |
|
| 235 |
```python
|
|
|
|
| 259 |
ca_kv_cache=True)
|
| 260 |
```
|
| 261 |
|
| 262 |
+
## π¦Ύ Training
|
| 263 |
In order to train using LoRA, you can use our existing code. Make sure all the requirements are installed.
|
| 264 |
|
| 265 |
+
### π Dataset Structure
|
| 266 |
|
| 267 |
Before starting model training using the command-line interface provided below, you must first configure your dataset dictionary file located at `training_code/ds_dict.py`.
|
| 268 |
|
|
|
|
| 276 |
|
| 277 |
You can find an example entry in `training_code/ds_dict.py`.
|
| 278 |
|
| 279 |
+
### π₯οΈ CLI Interface
|
| 280 |
```bash
|
| 281 |
python training_code/train.py \
|
| 282 |
--lora \
|
|
|
|
| 299 |
```bash
|
| 300 |
python training_code/train.py --help
|
| 301 |
```
|
|
|
|
| 302 |
|
| 303 |
+
## π License
|
| 304 |
+
|
| 305 |
+
This repository uses a dual license:
|
| 306 |
+
|
| 307 |
+
[](https://opensource.org/licenses/MIT)
|
| 308 |
+
Portions derived from [OpenAI Whisper](https://github.com/openai/whisper) are licensed under the **MIT License**.
|
| 309 |
+
|
| 310 |
+
[](https://creativecommons.org/licenses/by-nc/4.0/)
|
| 311 |
+
All other original code in this repository is licensed under the **Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0)**.
|
| 312 |
+
|
| 313 |
+
See the [LICENSE](./LICENSE) file for full details.
|
| 314 |
|
| 315 |
|
| 316 |
|