Add 3.15bpw
Browse files
README.md
CHANGED
|
@@ -77,6 +77,7 @@ The base quants use the new "MCG" multiplier from https://github.com/turboderp-o
|
|
| 77 |
| Quant | Size | Context / VRAM | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
|
| 78 |
| -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
|
| 79 |
| [2.10bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/2.10bpw-tuned)| 86 GiB | 131072 tokens, k5v4 for 96 GiB VRAM | 0.54398251 | 0.61162654 | 7.15544606 | 0.7584 | 0.4237 | 0.1948 | 0.0801 | 0.0306 |
|
|
|
|
| 80 |
| [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158 GiB | 202752 tokens (max), k6v5 for 192GiB VRAM | 0.15823333 | 0.15401253 | 6.41935951 | 0.8854 | 0.6743 | 0.4587 | 0.2832 | 0.1638 |
|
| 81 |
|
| 82 |
- "opt🂡" for automatically optimized quants
|
|
|
|
| 77 |
| Quant | Size | Context / VRAM | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
|
| 78 |
| -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
|
| 79 |
| [2.10bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/2.10bpw-tuned)| 86 GiB | 131072 tokens, k5v4 for 96 GiB VRAM | 0.54398251 | 0.61162654 | 7.15544606 | 0.7584 | 0.4237 | 0.1948 | 0.0801 | 0.0306 |
|
| 80 |
+
| [3.15bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.15bpw-tuned)| 129 GiB | 102400 tokens, k5v4 for 144 GiB VRAM | 0.21854555 | 0.21465828 | 6.35729832 | 0.8573 | 0.6119 | 0.3776 | 0.2107 | 0.1071 |
|
| 81 |
| [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158 GiB | 202752 tokens (max), k6v5 for 192GiB VRAM | 0.15823333 | 0.15401253 | 6.41935951 | 0.8854 | 0.6743 | 0.4587 | 0.2832 | 0.1638 |
|
| 82 |
|
| 83 |
- "opt🂡" for automatically optimized quants
|