Qwen3_8B_NoThink-NF4

🔍 Model Summary

Qwen3_8B_NoThink-NF4 is a modified and optimized variant of
huihui-ai/Huihui-Qwen3-8B-abliterated-v2, retrained to eliminate chain-of-thought reasoning, remove <think>-style internal monologue, and produce clean, direct answers.

This version is:

  • Fine-tuned on a curated NoThink FLAN-ULTRA dataset
  • Designed for deterministic, concise outputs
  • LoRA-finetuned, merged, and exported as a dense model
  • Fully NF4-quantized for efficient GPU inference
  • Published as: ikarius/Qwen3_8B_NoThink-NF4

🚀 What’s Special About This Model?

🧼 No chain-of-thought — ever

All verbose reasoning traces are removed during training.
The model only gives final answers, no explanations unless explicitly requested.

💬 More natural, more engaging

Training data produces a subtly more expressive and friendly tone compared to the original Qwen3.

NF4 Quantized

  • Fits easily on consumer GPUs (e.g., RTX 4070 / 5070 / 5080 / 4090 / 5090 / 3090)
  • Lower VRAM footprint
  • Excellent inference speed

🧩 Built for ChatML and general instruction prompts

Although trained with ChatML, it works perfectly with plain-text prompts.


📦 Model Details

Base Model

Originally:
huihui-ai/Huihui-Qwen3-8B-abliterated-v2

Then:

  1. LoRA fine-tuned with NoThink FLAN ULTRA
  2. Merged into a dense bf16 model
  3. Re-quantized to NF4

Quantization

  • load_in_4bit=True
  • NF4 quant type
  • Double quantization enabled
  • compute_dtype = bfloat16

🧠 Intended Uses

✔ Recommended

  • Chat assistants (no chain-of-thought)
  • Direct Q&A
  • Code generation
  • Summaries
  • Local inference
  • Edge devices or consumer GPUs
  • Context-constrained deployments

❌ Not suitable for

  • Multi-step reasoning tasks
  • Proof-based mathematics
  • Internal-deliberation-required tasks
  • Safety-critical systems

🏋️ Training Information

Dataset

nothink_flan_ULTRA.jsonl
~45k examples
Derived from FLAN + custom filtering:

  • Removed all <think> and reasoning traces
  • Preserved normal adult language
  • Removed unsafe sexual content (incest, illegal porn descriptions)
  • High diversity: QA, classification, summarization, translations, logic tasks, etc.

Training Method

  • QLoRA fine-tuning
  • LoRA rank: 64
  • LoRA alpha: 128
  • Dropout: 0.05
  • Learning rate: 1.2e-4
  • 1 epoch
  • Max seq length: 768 tokens
  • Optimizer: paged_adamw_8bit
  • Scheduler: cosine
  • BF16 training

Hardware

  • 1× NVIDIA RTX 5090 32GB
  • Peak VRAM usage: ~26–28 GB during training

📉 Limitations

  • May hallucinate factual details
  • Reduced deep reasoning ability by design
  • Not intended for tasks requiring explained logic
  • Tone is slightly friendlier than the base model

⚖️ License

This model follows the license of Qwen3 and
huihui-ai/Huihui-Qwen3-8B-abliterated-v2.

Users are responsible for ensuring legal compliance.


▶ Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "ikarius/Qwen3_8B_NoThink-NF4"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Explain what a black hole is in simple terms."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=200)

print(tokenizer.decode(output[0], skip_special_tokens=True))

⭐ Acknowledgements

Qwen3 team

Huihui for the original abliterated variant

Open-source contributors to TRL, PEFT, bitsandbytes

Community research that made chain-of-thought mitigation possible

Downloads last month
6
Safetensors
Model size
8B params
Tensor type
F16
·
F32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ikarius/Qwen3_8B_NoThink-NF4

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Quantized
(13)
this model