Qwen3-8B SFT LMSYS (Baseline)

This is the SFT baseline model for comparison with the DPO version.

Training Details

  • Base Model: unsloth/Qwen3-8B-4bit
  • Training Method: Supervised Fine-Tuning (SFT)
  • Dataset: LMSYS Arena Human Preference 55k (chosen responses only)
  • Training Steps: 60

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("subbuc/qwen3-8b-sft-lmsys")
tokenizer = AutoTokenizer.from_pretrained("subbuc/qwen3-8b-sft-lmsys")

# Your inference code here
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train subbuc/qwen3-8b-sft-lmsys