Q6_K barely smaller than Q8_0?
This is expected behavior for any GPT OSS based model as the original model was trained in and published in MXFP4 which llama.cpp does not requantize. The only quant that doesn't degrade the performance of the original model and makes any sense to use for any GPT OSS based model are the MXFP4_MOE quants. We are considering if it even makes any sense to provide any other than MXFP4 quants for GPT OSS based models as they seem quite useless. Especially any quant above 4 bits honestly really doesn't make any sense if the model was trained in 4 bits.
I highly recommend you just get the following quant: https://huggingface.co/mradermacher/gpt-oss-20b-Derestricted-i1-GGUF/blob/main/gpt-oss-20b-Derestricted.i1-MXFP4_MOE.gguf
But this is not the original GPT-OSS model that was post-trained and released in MXFP4, this is a abliterated version by ArliAI, and it was mentioned in the Reddit post that they converted it to BF16 before the abliteration process, and after abliteration the weight should no longer be "distributed" in a way that is suitable for MXFP4 quantization anymore. So I believe MXFP4 might not be the optimal quant for this model, given that the brief discussion at https://github.com/ikawrakow/ik_llama.cpp/pull/682 suggests that MXFP4 is not a good quantization scheme on its own.
But I downloaded your linked MXFP4 version to give it a test alongside the Q5_K_M version. With a very quick rough test, it feels that MXFP4 seems to hallucinate or mess up formatting more often.
