# Hybrid Naming Scheme & Benchmark Synopsis

This report summarizes baseline and hybrid quantization results for `Qwen3-4B-Thinking-2507-unsloth` as measured by the Magic Quant pipeline.

## Naming Scheme

Model variants follow a structured suffix convention that encodes both the base conversion mode and per-tensor quantization schemes.

| Suffix Example | Meaning |
| -------------- | ------- |
| `BF16` | Pure full-precision family baseline (no quantization). |
| `Q8_0`, `Q6_K`, `Q5_K`, `Q4_K_M`, `IQ4_NL`, `MXFP4_MOE` | Pure model-wide quantization baselines. |
| `iq4_nl-emb_Q4_K-head_Q4_K-moe_rt_Q4_K` | Base conversion mode `iq4_nl` with per-group schemes: embeddings (`emb_`), output head (`head_`), MoE router (`moe_rt_`). |
| `...-aq_F16-akv_Q8_0-fd_Q4_K-ao_Q5_K` | Extended sensitivity groups: Attention Q (`aq_`), Attention K+V (`akv_`), FFN Down (`fd_`), Attention Output (`ao_`). |
| `mxfp4_moe-emb_IQ4_NL-head_Q6_K-moe_exp_MXFP4-moe_rt_Q6_K` | MXFP4-centric hybrids with MoE expert group (`moe_exp_`) and mixed IQ / Q-schemes per tensor group. |

In general, anything after the base model name is a purely mechanical description of **how** the weights were transformed, not a new training run.

---

## Benchmark Methodology

All models were tested with a unified automated harness using `llama.cpp` tools.

**Included tests:**

- **Throughput:**  
    `llama-bench` with descending GPU offload (`-ngl 35 → 0`) and automatic OOM retry.  
    Highest successful TPS is recorded.

- **Perplexity:**  
    Three domains: **general**, **code**, **math**.  
    Each uses an auto-generated corpus of ~**32k tokens**.  
    Perplexity is computed with `llama-perplexity` at **2048-token** context.  
    Same GPU retry logic as above.

- **Precision loss:**  
    Each model is compared to its **family BF16 baseline**.  
    Precision-loss % is computed for all PPL domains, plus an averaged score.  
    Models are ranked by this metric.

---

### Table - Overview of Results

Comparing to BF16.

| model_name | size_reduction | tps_change |
| ---------- | -------------- | ---------- |
| mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 48.00% | 47.72% |
| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q5_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 59.60% | 57.71% |
| mxfp4_moe-akv_Q8_0-ao_Q5_K-aq_Q5_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 59.60% | 58.64% |
| iq4_nl-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 58.93% | 65.69% |
| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_Q6_K-fug_Q6_K | 62.13% | 72.19% |
| Q5_K | 64.13% | 50.37% |
| mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_Q5_K | 65.33% | 51.01% |
| Q4_K_M | 68.93% | 46.70% |
| mxfp4_moe-akv_Q6_K-ao_IQ4_NL-aq_Q5_K-emb_Q5_K-fd_Q5_K-fug_Q5_K | 65.87% | 56.81% |
| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_Q5_K-fug_IQ4_NL | 69.33% | 64.94% |
| IQ4_NL | 70.27% | 80.40% |
| mxfp4_moe-akv_Q6_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL | 68.40% | 89.01% |

* All percentages compared against the selected family BF16 baseline.

---

### Table - File Size + TPS + Avg Precision Loss

| model_name | file_size_gb | bench_tps | avg_prec_loss |
| ---------- | ------------ | --------- | ------------- |
| BF16 | 7.50 | 249.86 | 0.0000 |
| mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 3.90 | 369.09 | 0.0989 |
| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q5_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 3.03 | 394.06 | 0.1278 |
| mxfp4_moe-akv_Q8_0-ao_Q5_K-aq_Q5_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 3.03 | 396.39 | 0.1580 |
| iq4_nl-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 3.08 | 413.99 | 0.1740 |
| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_Q6_K-fug_Q6_K | 2.84 | 430.23 | 0.3832 |
| Q5_K | 2.69 | 375.72 | 0.5973 |
| mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_Q5_K | 2.60 | 377.32 | 1.1453 |
| Q4_K_M | 2.33 | 366.54 | 1.6668 |
| mxfp4_moe-akv_Q6_K-ao_IQ4_NL-aq_Q5_K-emb_Q5_K-fd_Q5_K-fug_Q5_K | 2.56 | 391.81 | 1.7707 |
| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_Q5_K-fug_IQ4_NL | 2.30 | 412.13 | 2.2740 |
| IQ4_NL | 2.23 | 450.75 | 2.4657 |
| mxfp4_moe-akv_Q6_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL | 2.37 | 472.25 | 2.5049 |

* `avg_prec_loss` is the averaged absolute precision-loss % vs BF16.

---

### Table - PPL Columns

| model_name | gen | gen_er | code | code_er | math | math_er |
| ---------- | --- | ------ | ---- | ------- | ---- | ------- |
| BF16 | 10.0106 | 0.2451 | 1.5917 | 0.0127 | 6.8896 | 0.1410 |
| mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 10.0081 | 0.2450 | 1.5936 | 0.0128 | 6.9001 | 0.1413 |
| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q5_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 9.9957 | 0.2441 | 1.5922 | 0.0127 | 6.9036 | 0.1412 |
| mxfp4_moe-akv_Q8_0-ao_Q5_K-aq_Q5_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 9.9952 | 0.2441 | 1.5934 | 0.0127 | 6.9043 | 0.1411 |
| iq4_nl-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 9.9687 | 0.2431 | 1.5927 | 0.0127 | 6.8924 | 0.1409 |
| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_Q6_K-fug_Q6_K | 10.0858 | 0.2460 | 1.5949 | 0.0126 | 6.9032 | 0.1403 |
| Q5_K | 10.0993 | 0.2473 | 1.5978 | 0.0128 | 6.9256 | 0.1413 |
| mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_Q5_K | 10.1868 | 0.2495 | 1.5995 | 0.0128 | 6.9713 | 0.1429 |
| Q4_K_M | 10.3239 | 0.2536 | 1.6093 | 0.0129 | 6.9423 | 0.1412 |
| mxfp4_moe-akv_Q6_K-ao_IQ4_NL-aq_Q5_K-emb_Q5_K-fd_Q5_K-fug_Q5_K | 10.2797 | 0.2537 | 1.6026 | 0.0129 | 7.0232 | 0.1451 |
| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_Q5_K-fug_IQ4_NL | 10.4164 | 0.2569 | 1.6143 | 0.0130 | 6.9825 | 0.1423 |
| IQ4_NL | 10.3718 | 0.2548 | 1.6125 | 0.0129 | 7.0606 | 0.1452 |
| mxfp4_moe-akv_Q6_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL | 10.3780 | 0.2547 | 1.6178 | 0.0132 | 7.0415 | 0.1443 |

* gen = ppl_general, code = ppl_code, math = ppl_math

---

### Table - Precision Loss Columns

| model_name | loss_general | loss_code | loss_math |
| ---------- | ------------ | --------- | --------- |
| BF16 | 0.0000 | 0.0000 | 0.0000 |
| mxfp4_moe-akv_Q8_0-ao_Q6_K-aq_Q8_0-emb_Q8_0-fd_Q8_0-fug_Q8_0 | 0.0250 | 0.1194 | 0.1524 |
| mxfp4_moe-akv_Q6_K-ao_Q6_K-aq_Q5_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 0.1488 | 0.0314 | 0.2032 |
| mxfp4_moe-akv_Q8_0-ao_Q5_K-aq_Q5_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 0.1538 | 0.1068 | 0.2134 |
| iq4_nl-akv_Q6_K-ao_Q6_K-aq_Q6_K-emb_Q6_K-fd_Q6_K-fug_Q6_K | 0.4186 | 0.0628 | 0.0406 |
| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_Q6_K-fug_Q6_K | 0.7512 | 0.2010 | 0.1974 |
| Q5_K | 0.8861 | 0.3832 | 0.5225 |
| mxfp4_moe-akv_Q5_K-ao_IQ4_NL-aq_IQ4_NL-emb_Q5_K-fd_Q6_K-fug_Q5_K | 1.7601 | 0.4900 | 1.1858 |
| Q4_K_M | 3.1297 | 1.1057 | 0.7649 |
| mxfp4_moe-akv_Q6_K-ao_IQ4_NL-aq_Q5_K-emb_Q5_K-fd_Q5_K-fug_Q5_K | 2.6882 | 0.6848 | 1.9392 |
| mxfp4_moe-akv_IQ4_NL-ao_MXFP4-aq_IQ4_NL-emb_Q6_K-fd_Q5_K-fug_IQ4_NL | 4.0537 | 1.4199 | 1.3484 |
| IQ4_NL | 3.6082 | 1.3068 | 2.4820 |
| mxfp4_moe-akv_Q6_K-ao_IQ4_NL-aq_IQ4_NL-emb_IQ4_NL-fd_Q6_K-fug_IQ4_NL | 3.6701 | 1.6398 | 2.2048 |

* loss_* values are absolute precision-loss % vs BF16 per domain.