# LLM Council - Comprehensive Guide ## 📝 Overview The LLM Council is a sophisticated multi-agent system that uses multiple Large Language Models (LLMs) to collectively answer questions through a 3-stage deliberation process: 1. **Stage 1 - Individual Responses**: Each council member independently answers the question 2. **Stage 2 - Peer Review**: Council members rank each other's anonymized responses 3. **Stage 3 - Synthesis**: A chairman model synthesizes the final answer based on all inputs **Current Implementation**: Uses FREE HuggingFace models (60%) + cheap OpenAI models (40%) ## 🏗️ Architecture ### Current Implementation ``` ┌─────────────────────────────────────────────────────────────┐ │ User Question │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Stage 1: Parallel Responses from 3-5 Council Models │ │ • Model 1: Individual answer │ │ • Model 2: Individual answer │ │ • Model 3: Individual answer │ │ • (etc...) │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Stage 2: Peer Rankings (Anonymized) │ │ • Each model ranks all responses (Response A, B, C...) │ │ • Aggregate rankings calculated │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Stage 3: Chairman Synthesis │ │ • Reviews all responses + rankings │ │ • Generates final comprehensive answer │ └─────────────────────────────────────────────────────────────┘ ``` ## 🔧 Current Models (FREE HuggingFace + OpenAI) ### Council Members (5 models) **FREE HuggingFace Models** (via Inference API): - `meta-llama/Llama-3.3-70B-Instruct` - Meta's latest Llama (FREE) - `Qwen/Qwen2.5-72B-Instruct` - Alibaba's Qwen (FREE) - `mistralai/Mixtral-8x7B-Instruct-v0.1` - Mistral MoE (FREE) **OpenAI Models** (paid but cheap): - `gpt-4o-mini` - Fast, affordable GPT-4 variant - `gpt-3.5-turbo` - Ultra cheap, still capable ### Chairman - `gpt-4o-mini` - Final synthesis model **Benefits of Current Setup:** - 60% of models are completely FREE (HuggingFace) - 40% use cheap OpenAI models ($0.001-0.01 per query) - 90-99% cost reduction compared to all-paid alternatives - No experimental/beta endpoints - all stable APIs - Diverse model providers for varied perspectives ## ✨ Alternative Model Configurations ### All-FREE Council (100% HuggingFace) ```python COUNCIL_MODELS = [ {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"}, {"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"}, {"provider": "huggingface", "model": "mistralai/Mixtral-8x7B-Instruct-v0.1"}, {"provider": "huggingface", "model": "meta-llama/Llama-3.1-405B-Instruct"}, {"provider": "huggingface", "model": "microsoft/Phi-3.5-MoE-instruct"}, ] CHAIRMAN_MODEL = {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"} ``` **Cost**: $0.00 per query! ### Premium Council (OpenAI + HuggingFace) ```python COUNCIL_MODELS = [ {"provider": "openai", "model": "gpt-4o"}, {"provider": "openai", "model": "gpt-4-turbo"}, {"provider": "huggingface", "model": "meta-llama/Llama-3.3-70B-Instruct"}, {"provider": "huggingface", "model": "Qwen/Qwen2.5-72B-Instruct"}, {"provider": "openai", "model": "gpt-3.5-turbo"}, ] CHAIRMAN_MODEL = {"provider": "openai", "model": "gpt-4o"} ``` **Cost**: ~$0.05-0.15 per query ## 🚀 Running on Hugging Face Spaces ### Prerequisites 1. **OpenAI API Key**: - Sign up at [platform.openai.com](https://platform.openai.com/) - Go to API Keys → Create new secret key - Copy your key (starts with `sk-`) - Add billing info and credits ($5-10 is plenty) 2. **HuggingFace API Token**: - Sign up at [huggingface.co](https://huggingface.co/) - Go to Settings → Access Tokens → New token - Copy your token (starts with `hf_`) - FREE! No billing required 3. **HuggingFace Account**: For deploying Spaces ### Step-by-Step Deployment ### Step-by-Step Deployment #### Method 1: Deploy Your Existing Code 1. **Create New Space** - Go to huggingface.co/new-space - Choose "Gradio" as SDK - Select SDK version: 6.0.0 - Choose hardware: CPU (free) 2. **Push Your Code** ```bash # Clone your space git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME cd YOUR_SPACE_NAME # Copy your LLM Council code cp -r /path/to/llm_council/* . # Commit and push git add . git commit -m "Initial deployment" git push ``` 3. **Configure Secrets** - Go to your space → Settings → Repository secrets - Add secret #1: - Name: `OPENAI_API_KEY` - Value: (your OpenAI key starting with `sk-`) - Add secret #2: - Name: `HUGGINGFACE_API_KEY` - Value: (your HuggingFace token starting with `hf_`) 4. **Space Auto-Restarts** - HF Spaces will automatically rebuild and deploy - Check the "Logs" tab to verify successful startup ### Required Files Structure ``` your-space/ ├── README.md # Space configuration ├── requirements.txt # Python dependencies ├── app.py # Main Gradio app ├── .env.example # Environment template └── backend/ ├── __init__.py ├── config.py # Model configuration ├── council.py # 3-stage logic ├── openrouter.py # API client ├── storage.py # Data storage └── main.py # FastAPI (optional) ``` ## 🔐 Environment Variables ### Required Variables **For Local Development** (`.env` file): ```bash OPENAI_API_KEY=sk-proj-your-key-here HUGGINGFACE_API_KEY=hf_your-token-here ``` **For HuggingFace Spaces** (Settings → Repository secrets): - Secret 1: `OPENAI_API_KEY` = `sk-proj-...` - Secret 2: `HUGGINGFACE_API_KEY` = `hf_...` ### API Endpoints Used **HuggingFace Inference API**: - Endpoint: `https://router.huggingface.co/v1/chat/completions` - Format: OpenAI-compatible - Cost: FREE for inference API - Models: Llama, Qwen, Mixtral, etc. **OpenAI API**: - Endpoint: `https://api.openai.com/v1/chat/completions` - Format: Native OpenAI - Cost: Pay-per-token (very cheap for mini/3.5-turbo) - Models: GPT-4o-mini, GPT-3.5-turbo, GPT-4o Create `.env` file locally (DO NOT commit to git): ```env OPENAI_API_KEY=sk-proj-your-key-here HUGGINGFACE_API_KEY=hf_your-token-here ``` For Hugging Face Spaces, use Repository Secrets instead of `.env` file. ## 📦 Dependencies ```txt gradio>=6.0.0 httpx>=0.27.0 python-dotenv>=1.0.0 openai>=1.0.0 # For OpenAI API ``` **Note**: The system uses: - `httpx` for async HTTP requests to HuggingFace API - `openai` SDK for OpenAI API calls - `python-dotenv` to load environment variables from `.env` ## 💻 Running Locally ```bash # 1. Clone repository (use your own space URL) git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME cd YOUR_SPACE_NAME # 2. Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # 3. Install dependencies pip install -r requirements.txt # 4. Create .env file with both API keys echo OPENAI_API_KEY=sk-proj-your-key-here > .env echo HUGGINGFACE_API_KEY=hf_your-token-here >> .env # 5. Run the app python app.py ``` The app will be available at `http://localhost:7860` ## 🔧 Code Architecture ### Key Components **1. Dual API Client** (`backend/api_client.py`): - Supports both HuggingFace and OpenAI APIs - Automatic retry logic with exponential backoff - Graceful error handling and fallbacks - Parallel model querying for efficiency **2. FREE Model Configuration** (`backend/config_free.py`): - Mix of FREE HuggingFace + cheap OpenAI models - Configurable timeouts and retries - Easy to customize and extend **3. Council Orchestration** (`backend/council_free.py`): - Stage 1: Parallel response collection - Stage 2: Peer ranking system - Stage 3: Chairman synthesis with streaming ### Error Handling Features - Retry logic with exponential backoff (3 attempts) - Graceful handling of individual model failures - Detailed error logging for debugging - Timeout management (60s default) ### Benefits of Current Architecture - **Cost Efficient**: 60% FREE models, 40% ultra-cheap - **Robust**: Retry logic handles transient failures - **Fast**: Parallel execution minimizes wait time - **Flexible**: Easy to add/remove models - **Observable**: Detailed logging for debugging ## 📊 Performance Characteristics ### Typical Response Times (Current Setup) - **Stage 1**: 10-30 seconds (5 models in parallel) - **Stage 2**: 15-45 seconds (peer rankings) - **Stage 3**: 15-40 seconds (synthesis with streaming) - **Total**: ~40-115 seconds per question ### Cost per Query (Current Setup) - **FREE HuggingFace portion**: $0.00 (3 models) - **OpenAI portion**: $0.001-0.01 (2 models) - **Total**: ~$0.001-0.01 per query **Comparison to alternatives**: - 90-99% cheaper than all-paid services - Similar quality to premium setups - Faster than sequential execution *Costs vary based on prompt length and response complexity* ## 🐛 Troubleshooting ### Common Issues 1. **"401 Unauthorized" errors** - Check both API keys are set correctly - Verify OpenAI key starts with `sk-` - Verify HuggingFace key starts with `hf_` - Ensure OpenAI account has billing/credits enabled - Check Space secrets are named exactly: `OPENAI_API_KEY` and `HUGGINGFACE_API_KEY` 2. **Timeout errors** - Increase timeout in `backend/config_free.py` - Check network connectivity - Some models may be slow - consider replacing 3. **Space won't start** - Verify `requirements.txt` includes all dependencies - Check logs in Space → Logs tab - Ensure both secrets are added (not just one) - Verify Python version compatibility (3.10+) 4. **Some models fail, others work** - Normal! System is designed to handle partial failures - Check logs to see which models failed - HuggingFace API may have rate limits (rare) - OpenAI API requires billing setup 5. **HuggingFace 410 error** - Old endpoint deprecated - Ensure using `router.huggingface.co/v1/chat/completions` - Update `backend/api_client.py` if needed ## 🎯 Best Practices 1. **Model Selection** - Use 3-5 council members (sweet spot for quality vs speed) - Mix FREE HuggingFace + cheap OpenAI for best value - Choose diverse models for varied perspectives - Match chairman to task complexity 2. **Cost Management** - Start with current setup ($0.001-0.01 per query) - Consider all-FREE HuggingFace config for $0 cost - Monitor OpenAI usage at platform.openai.com/usage - Set spending limits in OpenAI billing settings 3. **Quality Optimization** - Use more council members for important queries (5-7) - Use better chairman (gpt-4o instead of gpt-4o-mini) - Adjust timeouts based on model speed - Test different model combinations 4. **Security** - NEVER commit .env to git (use .gitignore) - Use HuggingFace Space secrets for production - Rotate API keys periodically - Monitor usage for anomalies - Set spending limits 3. **Quality Optimization** - Use Premium Council for important queries - Reasoning Council for math/logic problems - Adjust timeouts based on model speed ## 📚 Additional Resources - [OpenAI API Documentation](https://platform.openai.com/docs) - [HuggingFace Inference API](https://huggingface.co/docs/api-inference) - [Gradio Documentation](https://gradio.app/docs) - [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces) ## 🤝 Contributing Suggestions for improvement: 1. Add caching for repeated questions 2. Implement conversation history 3. Add custom model configurations via UI 4. Support for different voting mechanisms 5. Add cost tracking and estimates ## 📝 License Check the original repository for license information.