jerryzh168 commited on
Commit
e23ee47
·
verified ·
1 Parent(s): 6dd2748

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -212,7 +212,7 @@ We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-h
212
  | Benchmark | | | |
213
  |----------------------------------|----------------|------------------------|---------------------------|
214
  | | Qwen/Qwen3-8B | pytorch/Qwen3-8B-INT4 | pytorch/Qwen3-8B-AWQ-INT4 |
215
- | bbh | To be filled | To be filled | |
216
 
217
 
218
  <details>
@@ -242,8 +242,8 @@ lm_eval --model hf --model_args pretrained=$MODEL --tasks bbh --device cuda:0 --
242
 
243
  | Benchmark | | | |
244
  |------------------|----------------|--------------------------------|--------------------------------|
245
- | | Qwen/Qwen3-8B | pytorch/Qwen3-8B-INT4 | pytorch/Qwen3-8B-AWQ-INT4 |
246
- | Peak Memory (GB) | To be filled | To be filled (?% reduction) | |
247
 
248
 
249
 
@@ -299,11 +299,11 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
299
 
300
  # Model Performance
301
 
302
- ## Results (A100 machine)
303
- | Benchmark (Latency) | | |
304
- |----------------------------------|----------------|--------------------------|
305
- | | Qwen/Qwen3-8B | jerryzh168/Qwen3-8B-AWQ-INT4 |
306
- | latency (batch_size=1) | ?s | ?s (?x speedup) |
307
 
308
  <details>
309
  <summary> Reproduce Model Performance Results </summary>
 
212
  | Benchmark | | | |
213
  |----------------------------------|----------------|------------------------|---------------------------|
214
  | | Qwen/Qwen3-8B | pytorch/Qwen3-8B-INT4 | pytorch/Qwen3-8B-AWQ-INT4 |
215
+ | bbh | 79.33 | 74.92 | |
216
 
217
 
218
  <details>
 
242
 
243
  | Benchmark | | | |
244
  |------------------|----------------|--------------------------------|--------------------------------|
245
+ | | Qwen/Qwen3-8B | pytorch/Qwen3-8B-INT4 | pytorch/Qwen3-8B-AWQ-INT4 |
246
+ | Peak Memory (GB) | 16.47 | 6.27 (62% reduction) | 6.27 (62% reduction) |
247
 
248
 
249
 
 
299
 
300
  # Model Performance
301
 
302
+ ## Results (H100 machine)
303
+ | Benchmark (Latency) | | | |
304
+ |----------------------------------|----------------|---------------------------|---------------------------|
305
+ | | Qwen/Qwen3-8B | pytorch/Qwen3-8B-INT4 | pytorch/Qwen3-8B-AWQ-INT4 |
306
+ | latency (batch_size=1) | 2.46s | 1.40s (1.76x speedup) | |
307
 
308
  <details>
309
  <summary> Reproduce Model Performance Results </summary>