mratsim
/

GLM-4.7-EXL3

Text Generation

Model card Files Files and versions

mratsim commited on 14 days ago

Commit

6811f96

·

verified ·

1 Parent(s): bfe13f7

Add 3.15bpw

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -77,6 +77,7 @@ The base quants use the new "MCG" multiplier from https://github.com/turboderp-o
 | Quant                                                                            | Size       | Context / VRAM                            | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1  | Top-2  | Top-3  | Top-4  | Top-5  |
 | -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
 | [2.10bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/2.10bpw-tuned)| 86 GiB | 131072 tokens, k5v4 for 96 GiB VRAM       | 0.54398251           | 0.61162654           | 7.15544606 | 0.7584 | 0.4237 | 0.1948 | 0.0801 | 0.0306 |
 | [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158 GiB  | 202752 tokens (max), k6v5 for 192GiB VRAM | 0.15823333           | 0.15401253           | 6.41935951 | 0.8854 | 0.6743 | 0.4587 | 0.2832 | 0.1638 |
 - "opt🂡" for automatically optimized quants

 | Quant                                                                            | Size       | Context / VRAM                            | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1  | Top-2  | Top-3  | Top-4  | Top-5  |
 | -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
 | [2.10bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/2.10bpw-tuned)| 86 GiB | 131072 tokens, k5v4 for 96 GiB VRAM       | 0.54398251           | 0.61162654           | 7.15544606 | 0.7584 | 0.4237 | 0.1948 | 0.0801 | 0.0306 |
+| [3.15bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.15bpw-tuned)| 129 GiB | 102400 tokens, k5v4 for 144 GiB VRAM       | 0.21854555           | 0.21465828           | 6.35729832 | 0.8573 | 0.6119 | 0.3776 | 0.2107 | 0.1071 |
 | [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158 GiB  | 202752 tokens (max), k6v5 for 192GiB VRAM | 0.15823333           | 0.15401253           | 6.41935951 | 0.8854 | 0.6743 | 0.4587 | 0.2832 | 0.1638 |
 - "opt🂡" for automatically optimized quants