MLSpeech commited on
Commit
d736335
Β·
verified Β·
1 Parent(s): 86b4b0a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -12
README.md CHANGED
@@ -128,9 +128,11 @@ model-index:
128
  # CarelessWhisper - Causal Whisper Streaming Model
129
  Causal Whisper Streaming is a fine tuned version of OpenAI Whisper, which can handle causal data and perform real-time transcription.
130
 
131
- ![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)
132
- [![Demo on Hugging Face](https://img.shields.io/badge/πŸ€—%20Demo-Hugging%20Face-blueviolet?logo=huggingface&logoColor=white)](https://huggingface.co/spaces/MLSpeech/CarelessWhisper-causal-streaming)
133
 
 
 
 
134
 
135
  ## πŸ”§ Setup
136
  We used Python 3.9.16, PyTorch 2.6.0, and PyTorch-Lightning 2.5.0 to train and test our models.
@@ -162,7 +164,7 @@ To set up the project environment using `conda`, follow these steps:
162
 
163
  After installing all of the dependencies, you can try to run inference.
164
 
165
- ## Available Models
166
  We fine-tuned three different sizes of Whisper, all support english only transcription.
167
  A `large-v2` that was fine tuned on multilingual data is available, and supports English, French, Spanish, German and Portuguese with chunk size of 300 miliseconds.
168
 
@@ -173,13 +175,13 @@ A `large-v2` that was fine tuned on multilingual data is available, and supports
173
  |large-v2| 40, 100, 200, 300, 1000| 300 |
174
 
175
 
176
- ## Running Inference
177
  To run inference, download the repo content, and run from the repository root accroding to following sections.
178
 
179
  > **Note:** The models are hosted on the [Hugging Face Hub](https://huggingface.co/), which requires an access token.
180
  > Make sure you are logged in with your token to access the models.
181
 
182
- ### How to Apply Your Hugging Face Access Token
183
 
184
  1. **Create a Hugging Face account** (if you don’t have one) at [https://huggingface.co/join](https://huggingface.co/join).
185
 
@@ -199,7 +201,7 @@ To run inference, download the repo content, and run from the repository root ac
199
  Paste your token when prompted.
200
 
201
 
202
- ### CLI Usage
203
  The transcription model is easily activated using the next command:
204
  ```bash
205
  # Using a local microphone for streaming transcription, dumping the recording to out.wav
@@ -227,7 +229,7 @@ python transcribe.py \
227
  --use_latency
228
  ```
229
 
230
- ### Python Usage
231
  If you prefer using python, a code sinppet utilizing a microphone or a wav file is provided below:
232
 
233
  ```python
@@ -257,10 +259,10 @@ texts_wav_simulation = model.transcribe(simulate_stream=True,
257
  ca_kv_cache=True)
258
  ```
259
 
260
- ## Training
261
  In order to train using LoRA, you can use our existing code. Make sure all the requirements are installed.
262
 
263
- ### Dataset Structure
264
 
265
  Before starting model training using the command-line interface provided below, you must first configure your dataset dictionary file located at `training_code/ds_dict.py`.
266
 
@@ -274,7 +276,7 @@ This file defines a Python dictionary named `ds_paths`, where you should specify
274
 
275
  You can find an example entry in `training_code/ds_dict.py`.
276
 
277
- ### CLI Interface
278
  ```bash
279
  python training_code/train.py \
280
  --lora \
@@ -297,9 +299,18 @@ For more options and training configurations, run:
297
  ```bash
298
  python training_code/train.py --help
299
  ```
300
- ## πŸ™ Acknowledgements
301
 
302
- This project uses components from [OpenAI's Whisper](https://github.com/openai/whisper), licensed under the MIT License.
 
 
 
 
 
 
 
 
 
 
303
 
304
 
305
 
 
128
  # CarelessWhisper - Causal Whisper Streaming Model
129
  Causal Whisper Streaming is a fine tuned version of OpenAI Whisper, which can handle causal data and perform real-time transcription.
130
 
131
+ [![arXiv](https://img.shields.io/badge/arXiv-2508.12301-b31b1b.svg)](https://arxiv.org/abs/2508.12301) [![Demo on Hugging Face](https://img.shields.io/badge/πŸ€—%20Demo-Hugging%20Face-blueviolet?logo=huggingface&logoColor=white)](https://huggingface.co/spaces/MLSpeech/CarelessWhisper-causal-streaming)
 
132
 
133
+ ## πŸ“„ Paper
134
+
135
+ For more details, see our [paper](https://arxiv.org/abs/2508.12301).
136
 
137
  ## πŸ”§ Setup
138
  We used Python 3.9.16, PyTorch 2.6.0, and PyTorch-Lightning 2.5.0 to train and test our models.
 
164
 
165
  After installing all of the dependencies, you can try to run inference.
166
 
167
+ ## πŸ€– Available Models
168
  We fine-tuned three different sizes of Whisper, all support english only transcription.
169
  A `large-v2` that was fine tuned on multilingual data is available, and supports English, French, Spanish, German and Portuguese with chunk size of 300 miliseconds.
170
 
 
175
  |large-v2| 40, 100, 200, 300, 1000| 300 |
176
 
177
 
178
+ ## 🎀 Running Inference
179
  To run inference, download the repo content, and run from the repository root accroding to following sections.
180
 
181
  > **Note:** The models are hosted on the [Hugging Face Hub](https://huggingface.co/), which requires an access token.
182
  > Make sure you are logged in with your token to access the models.
183
 
184
+ ### How to Apply Your Hugging Face πŸ€— Access Token
185
 
186
  1. **Create a Hugging Face account** (if you don’t have one) at [https://huggingface.co/join](https://huggingface.co/join).
187
 
 
201
  Paste your token when prompted.
202
 
203
 
204
+ ### πŸ–₯️ CLI Usage
205
  The transcription model is easily activated using the next command:
206
  ```bash
207
  # Using a local microphone for streaming transcription, dumping the recording to out.wav
 
229
  --use_latency
230
  ```
231
 
232
+ ### 🐍 Python Usage
233
  If you prefer using python, a code sinppet utilizing a microphone or a wav file is provided below:
234
 
235
  ```python
 
259
  ca_kv_cache=True)
260
  ```
261
 
262
+ ## 🦾 Training
263
  In order to train using LoRA, you can use our existing code. Make sure all the requirements are installed.
264
 
265
+ ### πŸ“‚ Dataset Structure
266
 
267
  Before starting model training using the command-line interface provided below, you must first configure your dataset dictionary file located at `training_code/ds_dict.py`.
268
 
 
276
 
277
  You can find an example entry in `training_code/ds_dict.py`.
278
 
279
+ ### πŸ–₯️ CLI Interface
280
  ```bash
281
  python training_code/train.py \
282
  --lora \
 
299
  ```bash
300
  python training_code/train.py --help
301
  ```
 
302
 
303
+ ## πŸ“œ License
304
+
305
+ This repository uses a dual license:
306
+
307
+ [![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
308
+ Portions derived from [OpenAI Whisper](https://github.com/openai/whisper) are licensed under the **MIT License**.
309
+
310
+ [![CC BY-NC 4.0 License](https://img.shields.io/badge/License-CC--BY--NC%204.0-blue.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
311
+ All other original code in this repository is licensed under the **Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0)**.
312
+
313
+ See the [LICENSE](./LICENSE) file for full details.
314
 
315
 
316