mratsim commited on
Commit
6811f96
·
verified ·
1 Parent(s): bfe13f7

Add 3.15bpw

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -77,6 +77,7 @@ The base quants use the new "MCG" multiplier from https://github.com/turboderp-o
77
  | Quant | Size | Context / VRAM | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
78
  | -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
79
  | [2.10bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/2.10bpw-tuned)| 86 GiB | 131072 tokens, k5v4 for 96 GiB VRAM | 0.54398251 | 0.61162654 | 7.15544606 | 0.7584 | 0.4237 | 0.1948 | 0.0801 | 0.0306 |
 
80
  | [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158 GiB | 202752 tokens (max), k6v5 for 192GiB VRAM | 0.15823333 | 0.15401253 | 6.41935951 | 0.8854 | 0.6743 | 0.4587 | 0.2832 | 0.1638 |
81
 
82
  - "opt🂡" for automatically optimized quants
 
77
  | Quant | Size | Context / VRAM | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
78
  | -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
79
  | [2.10bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/2.10bpw-tuned)| 86 GiB | 131072 tokens, k5v4 for 96 GiB VRAM | 0.54398251 | 0.61162654 | 7.15544606 | 0.7584 | 0.4237 | 0.1948 | 0.0801 | 0.0306 |
80
+ | [3.15bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.15bpw-tuned)| 129 GiB | 102400 tokens, k5v4 for 144 GiB VRAM | 0.21854555 | 0.21465828 | 6.35729832 | 0.8573 | 0.6119 | 0.3776 | 0.2107 | 0.1071 |
81
  | [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158 GiB | 202752 tokens (max), k6v5 for 192GiB VRAM | 0.15823333 | 0.15401253 | 6.41935951 | 0.8854 | 0.6743 | 0.4587 | 0.2832 | 0.1638 |
82
 
83
  - "opt🂡" for automatically optimized quants