Gemma-2-2B Trading Summarizer (8-bit Quantized)

Model Description

This is an 8-bit quantized version of the fine-tuned Gemma-2-2B trading journal summarizer. It offers ~50% reduction in model size and memory usage with minimal quality loss.

Quantization Details

  • Method: bitsandbytes 8-bit quantization
  • Original Precision: fp16
  • Quantized Precision: int8
  • Size Reduction: ~50%
  • Quality Impact: Typically <2% degradation

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "./gemma-2b-trader-8bit",
    load_in_8bit=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./gemma-2b-trader-8bit")

# Same usage as fp16 version

When to Use This Version

  • Limited GPU memory (<8GB VRAM)
  • Faster loading times needed
  • Deployment on edge devices
  • When inference speed is more important than marginal quality

When to Use FP16 Version

  • Maximum quality required
  • Sufficient GPU memory available
  • Fine-tuning or further training needed
Downloads last month
13
Safetensors
Model size
3B params
Tensor type
F32
F16
I8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Wezenite/gemma-2b-trader-8bit

Base model

google/gemma-2-2b
Quantized
(63)
this model