Update README.md
Browse files
README.md
CHANGED
|
@@ -212,7 +212,7 @@ We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-h
|
|
| 212 |
| Benchmark | | | |
|
| 213 |
|----------------------------------|----------------|------------------------|---------------------------|
|
| 214 |
| | Qwen/Qwen3-8B | pytorch/Qwen3-8B-INT4 | pytorch/Qwen3-8B-AWQ-INT4 |
|
| 215 |
-
| bbh |
|
| 216 |
|
| 217 |
|
| 218 |
<details>
|
|
@@ -242,8 +242,8 @@ lm_eval --model hf --model_args pretrained=$MODEL --tasks bbh --device cuda:0 --
|
|
| 242 |
|
| 243 |
| Benchmark | | | |
|
| 244 |
|------------------|----------------|--------------------------------|--------------------------------|
|
| 245 |
-
| | Qwen/Qwen3-8B
|
| 246 |
-
| Peak Memory (GB) |
|
| 247 |
|
| 248 |
|
| 249 |
|
|
@@ -299,11 +299,11 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
|
|
| 299 |
|
| 300 |
# Model Performance
|
| 301 |
|
| 302 |
-
## Results (
|
| 303 |
-
| Benchmark (Latency) | |
|
| 304 |
-
|
| 305 |
-
| | Qwen/Qwen3-8B
|
| 306 |
-
| latency (batch_size=1) |
|
| 307 |
|
| 308 |
<details>
|
| 309 |
<summary> Reproduce Model Performance Results </summary>
|
|
|
|
| 212 |
| Benchmark | | | |
|
| 213 |
|----------------------------------|----------------|------------------------|---------------------------|
|
| 214 |
| | Qwen/Qwen3-8B | pytorch/Qwen3-8B-INT4 | pytorch/Qwen3-8B-AWQ-INT4 |
|
| 215 |
+
| bbh | 79.33 | 74.92 | |
|
| 216 |
|
| 217 |
|
| 218 |
<details>
|
|
|
|
| 242 |
|
| 243 |
| Benchmark | | | |
|
| 244 |
|------------------|----------------|--------------------------------|--------------------------------|
|
| 245 |
+
| | Qwen/Qwen3-8B | pytorch/Qwen3-8B-INT4 | pytorch/Qwen3-8B-AWQ-INT4 |
|
| 246 |
+
| Peak Memory (GB) | 16.47 | 6.27 (62% reduction) | 6.27 (62% reduction) |
|
| 247 |
|
| 248 |
|
| 249 |
|
|
|
|
| 299 |
|
| 300 |
# Model Performance
|
| 301 |
|
| 302 |
+
## Results (H100 machine)
|
| 303 |
+
| Benchmark (Latency) | | | |
|
| 304 |
+
|----------------------------------|----------------|---------------------------|---------------------------|
|
| 305 |
+
| | Qwen/Qwen3-8B | pytorch/Qwen3-8B-INT4 | pytorch/Qwen3-8B-AWQ-INT4 |
|
| 306 |
+
| latency (batch_size=1) | 2.46s | 1.40s (1.76x speedup) | |
|
| 307 |
|
| 308 |
<details>
|
| 309 |
<summary> Reproduce Model Performance Results </summary>
|