Hunyuan-0.5B-Instruct-GGUF
This repository contains GGUF quants for tencent/Hunyuan-0.5B-Instruct.
Hunyuan-0.5B is part of Tencent's efficient LLM series, featuring Hybrid Reasoning (fast and slow thinking modes) and a native 256K context window. Even at 0.5B parameters, it inherits robust performance from larger Hunyuan models, making it ideal for edge devices and resource-constrained environments.
Usage
llama.cpp
You can run these quants using the llama.cpp CLI:
./llama-cli -m Hunyuan-0.5B-Instruct*.gguf -p "Your prompt here" -n 128
Special Features
- Thinking Mode: This model supports "slow-thinking" reasoning. To disable CoT (Chain of Thought), add
/no_thinkbefore your prompt or setenable_thinking=Falsein your chat template. - Long Context: Natively supports 256K tokens
- Downloads last month
- 2,014
Hardware compatibility
Log In
to add your hardware
Model tree for Fu01978/Hunyuan-0.5B-Instruct-GGUF
Base model
tencent/Hunyuan-0.5B-Pretrain
Finetuned
tencent/Hunyuan-0.5B-Instruct