Meta Llama 3.1 8B - Alpaca

This model card describes the LLaMA 3 8B Alpaca model fine-tuned for instruction-following and chat tasks. It has been optimized for fast and accurate text generation using LoRA and bf16 precision.

Model Details

Model Description

This model is a fine-tuned version of LLaMA 3 8B using Alpaca-style instruction-following data. It is designed for natural language understanding and generation tasks, supporting conversational AI and other NLP applications.

Developed by: Anezatra
Shared by: HuggingFace Community
Model type: Transformer-based Language Model (Instruction-Finetuned)
Language(s) (NLP): English (with potential for multilingual input via translation)
License: Apache 2.0
Finetuned from model: LLaMA 3 8B

Model Sources

Repository: https://huggingface.co/unsloth/meta-llama-3.1-8b-alpaca
Paper: https://arxiv.org/abs/2302.13971 (LLaMA 3)
Demo: [Example usage with llama-cli]

Training Details

Training Data

Instruction-following datasets (Alpaca-style(Cleaned))

Training Procedure

Fine-tuned using LoRA
Epochs: 1
Steps: 120
Learning rate: 2e-4
Batch size: 6 per GPU
Gradient accumulation: 5 steps
Precision: bf16 mixed precision

Preprocessing

Text normalization: trimming, punctuation correction, lowercasing
Tokenization using LLaMA tokenizer
Merging conversational history for context-aware generation

Speeds, Sizes, Times

Model size (8B parameters) ~ 16 GB in bf16
Q4_K_M quantized GGUF ~ 4.9 GB
Training completed in under 24 hours on single GPU with LoRA

Evaluation

Testing Data, Factors & Metrics

Evaluated on held-out instruction-following tasks
Metrics: Perplexity, accuracy on factual Q&A, and BLEU for sequence generation
Human evaluation for conversational coherence

Results

Perplexity: ~1.35 on validation set
Maintains context across multiple turns in dialogue
Generates coherent and instruction-following responses

Environmental Impact

Hardware Type: NVIDIA L4 24GB GPU
Hours used: -

Technical Specifications

Model Architecture and Objective

Transformer decoder-only architecture
8B parameters, 32 layers, 32 attention heads
Optimized for instruction-following tasks and conversational AI

Compute Infrastructure

Hardware

Single 24GB L4 GPU

Software

PyTorch + Transformers + Unsloth LoRA integration
llama.cpp for GGUF inference on CPU

Citation

BibTeX:

@misc{meta-llama-3.1-8b-alpaca,
  title={LLaMA 3 8B Alpaca Fine-Tuned Model},
  author={Anezatra},
  year={2025},
  howpublished={\url{https://huggingface.co/unsloth/meta-llama-3.1-8b-alpaca}}
}

Downloads last month: 37

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anezatra/Llama-3.1-8B-alpaca-GGUF

Base model

meta-llama/Llama-3.1-8B

Adapter

(521)

this model

anezatra
/

Llama-3.1-8B-alpaca-GGUF