Gemma-2-2B Trading Summarizer (8-bit Quantized)
Model Description
This is an 8-bit quantized version of the fine-tuned Gemma-2-2B trading journal summarizer. It offers ~50% reduction in model size and memory usage with minimal quality loss.
Quantization Details
- Method: bitsandbytes 8-bit quantization
- Original Precision: fp16
- Quantized Precision: int8
- Size Reduction: ~50%
- Quality Impact: Typically <2% degradation
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"./gemma-2b-trader-8bit",
load_in_8bit=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./gemma-2b-trader-8bit")
# Same usage as fp16 version
When to Use This Version
- Limited GPU memory (<8GB VRAM)
- Faster loading times needed
- Deployment on edge devices
- When inference speed is more important than marginal quality
When to Use FP16 Version
- Maximum quality required
- Sufficient GPU memory available
- Fine-tuning or further training needed
- Downloads last month
- 13
Model tree for Wezenite/gemma-2b-trader-8bit
Base model
google/gemma-2-2b