# LLM Council - Comprehensive Guide

## 📝 Overview

The LLM Council is a sophisticated multi-agent system that uses multiple Large Language Models (LLMs) to collectively answer questions through a 3-stage deliberation process:

1. **Stage 1 - Individual Responses**: Each council member independently answers the question
2. **Stage 2 - Peer Review**: Council members rank each other's anonymized responses
3. **Stage 3 - Synthesis**: A chairman model synthesizes the final answer based on all inputs

**Current Implementation**: Uses FREE HuggingFace models (60%) + cheap OpenAI models (40%)

## 🏗️ Architecture

### Current Implementation

```
┌─────────────────────────────────────────────────────────────┐
│                        User Question                         │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Stage 1: Parallel Responses from 3-5 Council Models        │
│  • Model 1: Individual answer                               │
│  • Model 2: Individual answer                               │
│  • Model 3: Individual answer                               │
│  • (etc...)                                                  │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Stage 2: Peer Rankings (Anonymized)                        │
│  • Each model ranks all responses (Response A, B, C...)     │
│  • Aggregate rankings calculated                            │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Stage 3: Chairman Synthesis                                │
│  • Reviews all responses + rankings                         │
│  • Generates final comprehensive answer                     │
└─────────────────────────────────────────────────────────────┘
```

## 🔧 Current Models (FREE HuggingFace + OpenAI)

### Council Members (5 models)
**FREE HuggingFace Models** (via Inference API):
- `meta-llama/Llama-3.3-70B-Instruct` - Meta's latest Llama (FREE)
- `Qwen/Qwen2.5-72B-Instruct` - Alibaba's Qwen (FREE)
- `mistralai/Mixtral-8x7B-Instruct-v0.1` - Mistral MoE (FREE)

**OpenAI Models** (paid but cheap):
- `gpt-4o-mini` - Fast, affordable GPT-4 variant
- `gpt-3.5-turbo` - Ultra cheap, still capable

### Chairman
- `gpt-4o-mini` - Final synthesis model

**Benefits of Current Setup:**
- 60% of models are completely FREE (HuggingFace)
- 40% use cheap OpenAI models ($0.001-0.01 per query)
- 90-99% cost reduction compared to all-paid alternatives
- No experimental/beta endpoints - all stable APIs
- Diverse model providers for varied perspectives

## ✨ Alternative Model Configurations

### All-FREE Council (100% HuggingFace)

```python
COUNCIL_MODELS = [
    {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"},
    {"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"},
    {"provider": "huggingface", "model": "mistralai/Mixtral-8x7B-Instruct-v0.1"},
    {"provider": "huggingface", "model": "meta-llama/Llama-3.1-405B-Instruct"},
    {"provider": "huggingface", "model": "microsoft/Phi-3.5-MoE-instruct"},
]
CHAIRMAN_MODEL = {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"}
```
**Cost**: $0.00 per query!

### Premium Council (OpenAI + HuggingFace)

```python
COUNCIL_MODELS = [
    {"provider": "openai", "model": "gpt-4o"},
    {"provider": "openai", "model": "gpt-4-turbo"},
    {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"},
    {"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"},
    {"provider": "openai", "model": "gpt-3.5-turbo"},
]
CHAIRMAN_MODEL = {"provider": "openai", "model": "gpt-4o"}
```
**Cost**: ~$0.05-0.15 per query

## 🚀 Running on Hugging Face Spaces

### Prerequisites

1. **OpenAI API Key**: 
   - Sign up at [platform.openai.com](https://platform.openai.com/)
   - Go to API Keys → Create new secret key
   - Copy your key (starts with `sk-`)
   - Add billing info and credits ($5-10 is plenty)

2. **HuggingFace API Token**: 
   - Sign up at [huggingface.co](https://huggingface.co/)
   - Go to Settings → Access Tokens → New token
   - Copy your token (starts with `hf_`)
   - FREE! No billing required

3. **HuggingFace Account**: For deploying Spaces

### Step-by-Step Deployment

### Step-by-Step Deployment

#### Method 1: Deploy Your Existing Code

1. **Create New Space**
   - Go to huggingface.co/new-space
   - Choose "Gradio" as SDK
   - Select SDK version: 6.0.0
   - Choose hardware: CPU (free)

2. **Push Your Code**
   ```bash
   # Clone your space
   git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
   cd YOUR_SPACE_NAME
   
   # Copy your LLM Council code
   cp -r /path/to/llm_council/* .
   
   # Commit and push
   git add .
   git commit -m "Initial deployment"
   git push
   ```

3. **Configure Secrets**
   - Go to your space → Settings → Repository secrets
   - Add secret #1:
     - Name: `OPENAI_API_KEY`
     - Value: (your OpenAI key starting with `sk-`)
   - Add secret #2:
     - Name: `HUGGINGFACE_API_KEY`
     - Value: (your HuggingFace token starting with `hf_`)

4. **Space Auto-Restarts**
   - HF Spaces will automatically rebuild and deploy
   - Check the "Logs" tab to verify successful startup

### Required Files Structure

```
your-space/
├── README.md                    # Space configuration
├── requirements.txt             # Python dependencies
├── app.py                       # Main Gradio app
├── .env.example                 # Environment template
└── backend/
    ├── __init__.py
    ├── config.py                # Model configuration
    ├── council.py               # 3-stage logic
    ├── openrouter.py            # API client
    ├── storage.py               # Data storage
    └── main.py                  # FastAPI (optional)
```

## 🔐 Environment Variables

### Required Variables

**For Local Development** (`.env` file):
```bash
OPENAI_API_KEY=sk-proj-your-key-here
HUGGINGFACE_API_KEY=hf_your-token-here
```

**For HuggingFace Spaces** (Settings → Repository secrets):
- Secret 1: `OPENAI_API_KEY` = `sk-proj-...`
- Secret 2: `HUGGINGFACE_API_KEY` = `hf_...`

### API Endpoints Used

**HuggingFace Inference API**:
- Endpoint: `https://router.huggingface.co/v1/chat/completions`
- Format: OpenAI-compatible
- Cost: FREE for inference API
- Models: Llama, Qwen, Mixtral, etc.

**OpenAI API**:
- Endpoint: `https://api.openai.com/v1/chat/completions`
- Format: Native OpenAI
- Cost: Pay-per-token (very cheap for mini/3.5-turbo)
- Models: GPT-4o-mini, GPT-3.5-turbo, GPT-4o

Create `.env` file locally (DO NOT commit to git):

```env
OPENAI_API_KEY=sk-proj-your-key-here
HUGGINGFACE_API_KEY=hf_your-token-here
```

For Hugging Face Spaces, use Repository Secrets instead of `.env` file.

## 📦 Dependencies

```txt
gradio>=6.0.0
httpx>=0.27.0
python-dotenv>=1.0.0
openai>=1.0.0             # For OpenAI API
```

**Note**: The system uses:
- `httpx` for async HTTP requests to HuggingFace API
- `openai` SDK for OpenAI API calls
- `python-dotenv` to load environment variables from `.env`

## 💻 Running Locally

```bash
# 1. Clone repository (use your own space URL)
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Create .env file with both API keys
echo OPENAI_API_KEY=sk-proj-your-key-here > .env
echo HUGGINGFACE_API_KEY=hf_your-token-here >> .env

# 5. Run the app
python app.py
```

The app will be available at `http://localhost:7860`

## 🔧 Code Architecture

### Key Components

**1. Dual API Client** (`backend/api_client.py`):
- Supports both HuggingFace and OpenAI APIs
- Automatic retry logic with exponential backoff
- Graceful error handling and fallbacks
- Parallel model querying for efficiency

**2. FREE Model Configuration** (`backend/config_free.py`):
- Mix of FREE HuggingFace + cheap OpenAI models
- Configurable timeouts and retries
- Easy to customize and extend

**3. Council Orchestration** (`backend/council_free.py`):
- Stage 1: Parallel response collection
- Stage 2: Peer ranking system
- Stage 3: Chairman synthesis with streaming

### Error Handling Features
- Retry logic with exponential backoff (3 attempts)
- Graceful handling of individual model failures
- Detailed error logging for debugging
- Timeout management (60s default)

### Benefits of Current Architecture
- **Cost Efficient**: 60% FREE models, 40% ultra-cheap
- **Robust**: Retry logic handles transient failures
- **Fast**: Parallel execution minimizes wait time
- **Flexible**: Easy to add/remove models
- **Observable**: Detailed logging for debugging

## 📊 Performance Characteristics

### Typical Response Times (Current Setup)
- **Stage 1**: 10-30 seconds (5 models in parallel)
- **Stage 2**: 15-45 seconds (peer rankings)
- **Stage 3**: 15-40 seconds (synthesis with streaming)
- **Total**: ~40-115 seconds per question

### Cost per Query (Current Setup)
- **FREE HuggingFace portion**: $0.00 (3 models)
- **OpenAI portion**: $0.001-0.01 (2 models)
- **Total**: ~$0.001-0.01 per query

**Comparison to alternatives**:
- 90-99% cheaper than all-paid services
- Similar quality to premium setups
- Faster than sequential execution

*Costs vary based on prompt length and response complexity*

## 🐛 Troubleshooting

### Common Issues

1. **"401 Unauthorized" errors**
   - Check both API keys are set correctly
   - Verify OpenAI key starts with `sk-`
   - Verify HuggingFace key starts with `hf_`
   - Ensure OpenAI account has billing/credits enabled
   - Check Space secrets are named exactly: `OPENAI_API_KEY` and `HUGGINGFACE_API_KEY`

2. **Timeout errors**
   - Increase timeout in `backend/config_free.py`
   - Check network connectivity
   - Some models may be slow - consider replacing

3. **Space won't start**
   - Verify `requirements.txt` includes all dependencies
   - Check logs in Space → Logs tab
   - Ensure both secrets are added (not just one)
   - Verify Python version compatibility (3.10+)

4. **Some models fail, others work**
   - Normal! System is designed to handle partial failures
   - Check logs to see which models failed
   - HuggingFace API may have rate limits (rare)
   - OpenAI API requires billing setup

5. **HuggingFace 410 error**
   - Old endpoint deprecated
   - Ensure using `router.huggingface.co/v1/chat/completions`
   - Update `backend/api_client.py` if needed

## 🎯 Best Practices

1. **Model Selection**
   - Use 3-5 council members (sweet spot for quality vs speed)
   - Mix FREE HuggingFace + cheap OpenAI for best value
   - Choose diverse models for varied perspectives
   - Match chairman to task complexity

2. **Cost Management**
   - Start with current setup ($0.001-0.01 per query)
   - Consider all-FREE HuggingFace config for $0 cost
   - Monitor OpenAI usage at platform.openai.com/usage
   - Set spending limits in OpenAI billing settings

3. **Quality Optimization**
   - Use more council members for important queries (5-7)
   - Use better chairman (gpt-4o instead of gpt-4o-mini)
   - Adjust timeouts based on model speed
   - Test different model combinations

4. **Security**
   - NEVER commit .env to git (use .gitignore)
   - Use HuggingFace Space secrets for production
   - Rotate API keys periodically
   - Monitor usage for anomalies
   - Set spending limits

3. **Quality Optimization**
   - Use Premium Council for important queries
   - Reasoning Council for math/logic problems
   - Adjust timeouts based on model speed

## 📚 Additional Resources

- [OpenAI API Documentation](https://platform.openai.com/docs)
- [HuggingFace Inference API](https://huggingface.co/docs/api-inference)
- [Gradio Documentation](https://gradio.app/docs)
- [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces)

## 🤝 Contributing

Suggestions for improvement:
1. Add caching for repeated questions
2. Implement conversation history
3. Add custom model configurations via UI
4. Support for different voting mechanisms
5. Add cost tracking and estimates

## 📝 License

Check the original repository for license information.