Qwen3_8B_NoThink-NF4
🔍 Model Summary
Qwen3_8B_NoThink-NF4 is a modified and optimized variant ofhuihui-ai/Huihui-Qwen3-8B-abliterated-v2, retrained to eliminate chain-of-thought reasoning, remove <think>-style internal monologue, and produce clean, direct answers.
This version is:
- Fine-tuned on a curated NoThink FLAN-ULTRA dataset
- Designed for deterministic, concise outputs
- LoRA-finetuned, merged, and exported as a dense model
- Fully NF4-quantized for efficient GPU inference
- Published as: ikarius/Qwen3_8B_NoThink-NF4
🚀 What’s Special About This Model?
🧼 No chain-of-thought — ever
All verbose reasoning traces are removed during training.
The model only gives final answers, no explanations unless explicitly requested.
💬 More natural, more engaging
Training data produces a subtly more expressive and friendly tone compared to the original Qwen3.
⚡ NF4 Quantized
- Fits easily on consumer GPUs (e.g., RTX 4070 / 5070 / 5080 / 4090 / 5090 / 3090)
- Lower VRAM footprint
- Excellent inference speed
🧩 Built for ChatML and general instruction prompts
Although trained with ChatML, it works perfectly with plain-text prompts.
📦 Model Details
Base Model
Originally:huihui-ai/Huihui-Qwen3-8B-abliterated-v2
Then:
- LoRA fine-tuned with NoThink FLAN ULTRA
- Merged into a dense bf16 model
- Re-quantized to NF4
Quantization
load_in_4bit=True- NF4 quant type
- Double quantization enabled
- compute_dtype =
bfloat16
🧠 Intended Uses
✔ Recommended
- Chat assistants (no chain-of-thought)
- Direct Q&A
- Code generation
- Summaries
- Local inference
- Edge devices or consumer GPUs
- Context-constrained deployments
❌ Not suitable for
- Multi-step reasoning tasks
- Proof-based mathematics
- Internal-deliberation-required tasks
- Safety-critical systems
🏋️ Training Information
Dataset
nothink_flan_ULTRA.jsonl
~45k examples
Derived from FLAN + custom filtering:
- Removed all
<think>and reasoning traces - Preserved normal adult language
- Removed unsafe sexual content (incest, illegal porn descriptions)
- High diversity: QA, classification, summarization, translations, logic tasks, etc.
Training Method
- QLoRA fine-tuning
- LoRA rank: 64
- LoRA alpha: 128
- Dropout: 0.05
- Learning rate: 1.2e-4
- 1 epoch
- Max seq length: 768 tokens
- Optimizer:
paged_adamw_8bit - Scheduler: cosine
- BF16 training
Hardware
- 1× NVIDIA RTX 5090 32GB
- Peak VRAM usage: ~26–28 GB during training
📉 Limitations
- May hallucinate factual details
- Reduced deep reasoning ability by design
- Not intended for tasks requiring explained logic
- Tone is slightly friendlier than the base model
⚖️ License
This model follows the license of Qwen3 andhuihui-ai/Huihui-Qwen3-8B-abliterated-v2.
Users are responsible for ensuring legal compliance.
▶ Example Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "ikarius/Qwen3_8B_NoThink-NF4"
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Explain what a black hole is in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(output[0], skip_special_tokens=True))
⭐ Acknowledgements
Qwen3 team
Huihui for the original abliterated variant
Open-source contributors to TRL, PEFT, bitsandbytes
Community research that made chain-of-thought mitigation possible
- Downloads last month
- 6
Model tree for ikarius/Qwen3_8B_NoThink-NF4
Base model
Qwen/Qwen3-8B-Base