feihu.hf commited on
Commit
061a2ac
·
1 Parent(s): c3fe46f

update README

Browse files
Files changed (1) hide show
  1. README.md +9 -2
README.md CHANGED
@@ -233,6 +233,13 @@ For full technical details, see the [Qwen2.5-1M Technical Report](https://arxiv.
233
 
234
  Replace the content of your `config.json` with `config_1m.json`, which includes the config for length extrapolation and sparse attention.
235
 
 
 
 
 
 
 
 
236
  #### Step 2: Launch Model Server
237
 
238
  After updating the config, proceed with either **vLLM** or **SGLang** for serving the model.
@@ -251,7 +258,7 @@ Then launch the server with Dual Chunk Flash Attention enabled:
251
 
252
  ```bash
253
  VLLM_ATTENTION_BACKEND=DUAL_CHUNK_FLASH_ATTN VLLM_USE_V1=0 \
254
- vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507 \
255
  --tensor-parallel-size 8 \
256
  --max-model-len 1010000 \
257
  --enable-chunked-prefill \
@@ -288,7 +295,7 @@ Launch the server with DCA support:
288
 
289
  ```bash
290
  python3 -m sglang.launch_server \
291
- --model-path Qwen/Qwen3-235B-A22B-Instruct-2507 \
292
  --context-length 1010000 \
293
  --mem-frac 0.75 \
294
  --attention-backend dual_chunk_flash_attn \
 
233
 
234
  Replace the content of your `config.json` with `config_1m.json`, which includes the config for length extrapolation and sparse attention.
235
 
236
+ ```bash
237
+ export MODELNAME=Qwen3-235B-A22B-Thinking-2507
238
+ huggingface-cli download Qwen/${MODELNAME} --local-dir ${MODELNAME}
239
+ mv ${MODELNAME}/config.json ${MODELNAME}/config.json.bak
240
+ mv ${MODELNAME}/config_1m.json ${MODELNAME}/config.json
241
+ ```
242
+
243
  #### Step 2: Launch Model Server
244
 
245
  After updating the config, proceed with either **vLLM** or **SGLang** for serving the model.
 
258
 
259
  ```bash
260
  VLLM_ATTENTION_BACKEND=DUAL_CHUNK_FLASH_ATTN VLLM_USE_V1=0 \
261
+ vllm serve ./Qwen3-235B-A22B-Thinking-2507 \
262
  --tensor-parallel-size 8 \
263
  --max-model-len 1010000 \
264
  --enable-chunked-prefill \
 
295
 
296
  ```bash
297
  python3 -m sglang.launch_server \
298
+ --model-path ./Qwen3-235B-A22B-Thinking-2507 \
299
  --context-length 1010000 \
300
  --mem-frac 0.75 \
301
  --attention-backend dual_chunk_flash_attn \