Qwen3_8B_NoThink-NF4

🔍 Model Summary

Qwen3_8B_NoThink-NF4 is a modified and optimized variant of
huihui-ai/Huihui-Qwen3-8B-abliterated-v2, retrained to eliminate chain-of-thought reasoning, remove <think>-style internal monologue, and produce clean, direct answers.

This version is:

Fine-tuned on a curated NoThink FLAN-ULTRA dataset
Designed for deterministic, concise outputs
LoRA-finetuned, merged, and exported as a dense model
Fully NF4-quantized for efficient GPU inference
Published as: ikarius/Qwen3_8B_NoThink-NF4

🚀 What’s Special About This Model?

🧼 No chain-of-thought — ever

All verbose reasoning traces are removed during training.
The model only gives final answers, no explanations unless explicitly requested.

💬 More natural, more engaging

Training data produces a subtly more expressive and friendly tone compared to the original Qwen3.

⚡ NF4 Quantized

Fits easily on consumer GPUs (e.g., RTX 4070 / 5070 / 5080 / 4090 / 5090 / 3090)
Lower VRAM footprint
Excellent inference speed

🧩 Built for ChatML and general instruction prompts

Although trained with ChatML, it works perfectly with plain-text prompts.

📦 Model Details

Base Model

Originally:
huihui-ai/Huihui-Qwen3-8B-abliterated-v2

Then:

LoRA fine-tuned with NoThink FLAN ULTRA
Merged into a dense bf16 model
Re-quantized to NF4

Quantization

load_in_4bit=True
NF4 quant type
Double quantization enabled
compute_dtype = bfloat16

🧠 Intended Uses

✔ Recommended

Chat assistants (no chain-of-thought)
Direct Q&A
Code generation
Summaries
Local inference
Edge devices or consumer GPUs
Context-constrained deployments

❌ Not suitable for

Multi-step reasoning tasks
Proof-based mathematics
Internal-deliberation-required tasks
Safety-critical systems

🏋️ Training Information

Dataset

nothink_flan_ULTRA.jsonl
~45k examples
Derived from FLAN + custom filtering:

Removed all <think> and reasoning traces
Preserved normal adult language
Removed unsafe sexual content (incest, illegal porn descriptions)
High diversity: QA, classification, summarization, translations, logic tasks, etc.

Training Method

QLoRA fine-tuning
LoRA rank: 64
LoRA alpha: 128
Dropout: 0.05
Learning rate: 1.2e-4
1 epoch
Max seq length: 768 tokens
Optimizer: paged_adamw_8bit
Scheduler: cosine
BF16 training

Hardware

1× NVIDIA RTX 5090 32GB
Peak VRAM usage: ~26–28 GB during training

📉 Limitations

May hallucinate factual details
Reduced deep reasoning ability by design
Not intended for tasks requiring explained logic
Tone is slightly friendlier than the base model

⚖️ License

This model follows the license of Qwen3 and
huihui-ai/Huihui-Qwen3-8B-abliterated-v2.

Users are responsible for ensuring legal compliance.

▶ Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "ikarius/Qwen3_8B_NoThink-NF4"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Explain what a black hole is in simple terms."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=200)

print(tokenizer.decode(output[0], skip_special_tokens=True))

⭐ Acknowledgements

Qwen3 team

Huihui for the original abliterated variant

Open-source contributors to TRL, PEFT, bitsandbytes

Community research that made chain-of-thought mitigation possible

Downloads last month: 6

Safetensors

Model size

8B params

Tensor type

F16

F32

Model tree for ikarius/Qwen3_8B_NoThink-NF4

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

huihui-ai/Huihui-Qwen3-8B-abliterated-v2

Quantized

(13)

this model