Model Card
Model Description
MedSwin-7B-SFT is a specialized 7B parameter language model for medical question-answering, instruction-following, and clinical reasoning. It was created by applying a rigorous, multi-stage Supervised Fine-Tuning (SFT) pipeline to the medalpaca/medalpaca-7b base model. The core innovation lies in its data augmentation methodology, which is designed to maximize data diversity and robustness while strictly preserving clinical fidelity and accuracy.
- Developed by: Medical Swinburne University of Technology AI Team
- Funded by: Swinburne University of Technology
- Base Model: medalpaca/medalpaca-7b
- Language(s): English
- License: Apache 2.0
Intended Use
This model is intended for research purposes in the following domains:
- AI-assisted medicine and clinical decision support research.
- Biomedical natural language processing (NLP).
- Exploration of robust instruction-tuning and knowledge distillation in specialized domains.
- Generating high-quality, clinically-grounded synthetic data for further model training.
Training Data
The model was fine-tuned on a curated and augmented collection of medical QA datasets:
- PubMedQA: Original and processed (map, u, l) variants for factoid and research-oriented questions.
- HealthCareMagic & iCliniq: Real-world patient-doctor interactions from online portals.
Data Curation & Augmentation Pipeline
The training data underwent a sophisticated, reliability-first augmentation pipeline to enhance diversity and robustness while ensuring semantic and clinical integrity.
| Stage | Purpose | Methodology & Quality Control |
|---|---|---|
| A. LLM-Based Paraphrasing | Introduce syntactic diversity while locking medical terminology. | Multi-model approach (LLaMA-8B, Gemini) for redundancy. Difficulty levels: Easy (lexical) vs. Hard (structural). Terminology Lock: Constrained decoding to preserve drug names, dosages, ICD/LOINC codes. QC Metric: Semantic similarity, and Jaccard(term-locks) = 1.0. |
| B. Back-Translation | Induce discourse-level variation. | Pivot Languages: Stochastic selection from {es, de, vi, fr}. Quality Control: Length ratio bounds in [0.8,1.2] and semantic similarity. Auto-retry on failure. |
| C. Style Standardization | Enforce a neutral, professional clinical tone. | Register Enforcement: Discourages colloquialisms. Probabilistic Rewrites: e.g., "will" → "is likely to". Artifact Removal: Strips forum-specific sign-offs, greetings, and disclaimers. |
| D. Multi-Variant Generation | Cover diverse reasoning and presentation styles. | Answer Variants: Concise, detailed, clinician-focused, patient-friendly. Question Variants: Clarifying, follow-up, symptom-centric, treatment-centric. Cross-combinations: Up to 9 variants per seed, capped by budget and uniqueness. |
| E. Clinical Scenario Creation | Build contextual robustness for multi-hop reasoning. | Simulated Contexts: ED triage, routine visit, chronic care, caregiver perspective. Effect: Trains model to consider contraindications, monitoring, and context-specific factors. |
| F. Quality Assurance | Ensure data purity and consistency. | F1. Data Cleaning: PHI removal (email/phone/URL/IP); MD5-based deduplication; filler removal. F2. Validation: LLM consistency checks, max-length caps, English-only filter. |
Output Format
All training data was formatted into a standardized SFT structure to facilitate clear instruction-following:
### Instruction:
{Task descriptor and/or user question with context}
### Input:
{Additional user question or context, if any}
### Output:
{The model's target response}
Each data point includes metadata tags for its augmentation source (para, bt, style, scenario), original source IDs, and quality control scores.
Usage
You can load and use the model with the Hugging Face transformers library.
import transformers
model_id = "MedSwin/MedSwin-7B-SFT"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
device_map="auto", # Use GPU if available
)
# Format your input according to the training template
instruction = "Based on the provided context, what is the most likely diagnosis?"
context = "A 45-year-old male presents with acute, crushing substernal chest pain radiating to the left arm, associated with diaphoresis and nausea for the past hour."
formatted_prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n### Output:\n"
# Generate a response
sequences = pipeline(
formatted_prompt,
max_new_tokens=256,
do_sample=True,
temperature=0.3,
top_p=0.9,
eos_token_id=pipeline.tokenizer.eos_token_id,
)
print(sequences[0]['generated_text'])
Bias, Risks, and Limitations
The model inherits and may amplify biases present in its base model and training data. These can include:
- Demographic Biases: Biases related to race, gender, age, or socioeconomic status based on patterns in the source data.
- Clinical Biases: Potential over-representation of certain conditions, treatments, or clinical perspectives.
- Factual Accuracy: While stringent controls were applied, the model is not a knowledge base and can generate incorrect or outdated medical information.
- Safe Deployment: Use a Human-in-the-Loop (HITL) system for any real-world application. Outputs must be verified by a qualified professional.
Technical Specifications
Model Architecture: Based on LLaMA, fine-tuned via Supervised Fine-Tuning (SFT).
Model Size: 7 Billion parameters.
Input Format: Instruction-Input-Output structure.
Review all model metrics benchmark via Benchmark Document Preview.
- Downloads last month
- 49