# LLM Providers Guide Complete guide to using different LLM providers with TypeDialog Agent. ## Overview TypeDialog Agent supports 4 LLM providers, each with unique strengths: | Provider | Type | Best For | Privacy | Cost | | ---------- | ------ | ---------- | --------- | ------ | | **Claude** | Cloud | Code, reasoning, analysis | ❌ Cloud | $$ | | **OpenAI** | Cloud | Code, general tasks | ❌ Cloud | $$ | | **Gemini** | Cloud | Creative, multi-modal | ❌ Cloud | $ (free tier) | | **Ollama** | Local | Privacy, offline, free | ✅ Local | Free | ## Choosing a Provider ### Quick Decision Tree ```text Need privacy or offline? → Ollama Budget constrained? → Gemini (free tier) or Ollama Code review/refactoring? → Claude Sonnet or GPT-4o Creative writing? → Gemini 2.0 Flash Critical architecture? → Claude Opus Quick tasks? → Claude Haiku General purpose? → Claude Sonnet or GPT-4o-mini ```text ## Claude (Anthropic) ### Claude overview - **Best for**: Code analysis, reasoning, planning - **Strengths**: Excellent code understanding, strong reasoning, reliable - **Pricing**: Pay per token - **Streaming**: ✅ With token usage ### Claude models available | Model | Use Case | Speed | Cost | Context | | ------- | ---------- | ------- | ------ | --------- | | `claude-3-5-haiku-20241022` | Quick tasks, prototyping | ⚡⚡⚡ | $ | 200K | | `claude-3-5-sonnet-20241022` | Balanced quality/speed | ⚡⚡ | $$ | 200K | | `claude-opus-4-5-20251101` | Complex reasoning, critical | ⚡ | $$$ | 200K | ### Claude setup ```bash # Get API key from: https://console.anthropic.com export ANTHROPIC_API_KEY=sk-ant-... # Add to shell profile echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc ```text ### Claude usage in agents ```yaml --- @agent { role: code reviewer, llm: claude-3-5-sonnet-20241022, max_tokens: 4096, temperature: 0.3 } --- ```text ### Claude best practices **Use Haiku for:** - Simple queries - Quick prototyping - High-volume tasks - Tight budgets **Use Sonnet for:** - Code review - Documentation - Task planning - General development **Use Opus for:** - Architecture design - Critical security reviews - Complex problem-solving - High-stakes decisions ### Claude cost optimization ```yaml # Limit tokens for short tasks @agent { llm: claude-3-5-haiku-20241022, max_tokens: 500 # Cheaper than default 4096 } # Use lower temperature for factual tasks @agent { temperature: 0.2 # More consistent, potentially cheaper } ```text --- ## OpenAI (GPT) ### OpenAI overview - **Best for**: Code generation, broad knowledge - **Strengths**: Excellent code, well-documented, reliable - **Pricing**: Pay per token - **Streaming**: ✅ But NO token usage in stream ### OpenAI models available | Model | Use Case | Speed | Cost | Context | | ------- | ---------- | ------- | ------ | --------- | | `gpt-4o-mini` | Fast code tasks | ⚡⚡⚡ | $ | 128K | | `gpt-4o` | General purpose, code | ⚡⚡ | $$ | 128K | | `o1` | Advanced reasoning | ⚡ | $$$ | 128K | | `o3` | Complex problems | ⚡ | $$$$ | 128K | ### OpenAI setup ```bash # Get API key from: https://platform.openai.com export OPENAI_API_KEY=sk-... # Add to shell profile echo 'export OPENAI_API_KEY=sk-...' >> ~/.bashrc ```text ### OpenAI usage in agents ```yaml --- @agent { role: refactoring assistant, llm: gpt-4o-mini, max_tokens: 2048, temperature: 0.2 } --- ```text ### OpenAI best practices **Use GPT-4o-mini for:** - Code refactoring - Quick iterations - High-volume tasks - Development/testing **Use GPT-4o for:** - Production code generation - Complex documentation - Multi-step tasks **Use o1/o3 for:** - Mathematical reasoning - Complex algorithm design - Research tasks ### Important Limitation ⚠️ **OpenAI does NOT provide token usage in streaming mode** ```yaml # You'll get streaming text but no token counts during stream # Token usage only available in blocking mode ```text If you need token tracking, use Claude or Gemini instead. --- ## Google Gemini ### Gemini overview - **Best for**: Creative content, multi-modal tasks - **Strengths**: Creative, free tier, fast - **Pricing**: Free tier + pay per token - **Streaming**: ✅ With token usage ### Gemini models available | Model | Use Case | Speed | Cost | Context | | ------- | ---------- | ------- | ------ | --------- | | `gemini-2.0-flash-exp` | Fast, general purpose | ⚡⚡⚡ | Free tier | 1M | | `gemini-1.5-flash` | Lightweight tasks | ⚡⚡⚡ | $ | 1M | | `gemini-1.5-pro` | Complex reasoning | ⚡⚡ | $$ | 2M | | `gemini-3-pro` | Preview features | ⚡ | $$$ | TBD | ### Gemini setup ```bash # Get API key from: https://aistudio.google.com/app/apikey export GEMINI_API_KEY=... # Or use GOOGLE_API_KEY export GOOGLE_API_KEY=... ```text ### Gemini usage in agents ```yaml --- @agent { role: creative writer, llm: gemini-2.0-flash-exp, max_tokens: 4096, temperature: 0.9 # High for creativity } --- ```text ### Gemini best practices **Use Gemini 2.0 Flash for:** - Creative writing - Content generation - Prototyping (free tier) - High-volume tasks **Use Gemini 1.5 Pro for:** - Long documents (2M context) - Complex analysis - Production workloads ### Free Tier Gemini offers a generous free tier: - **Rate limits**: 15 RPM, 1M TPM, 1500 RPD - **Models**: gemini-2.0-flash-exp - **Perfect for**: Development, testing, low-volume production ### Unique Features ```yaml # Gemini uses "model" role instead of "assistant" # TypeDialog handles this automatically # Huge context windows (1M-2M tokens) @agent { llm: gemini-1.5-pro # 2M token context! } ```text --- ## Ollama (Local Models) ### Ollama overview - **Best for**: Privacy, offline, cost-sensitive - **Strengths**: Free, private, offline, no API limits - **Pricing**: Free (compute costs only) - **Streaming**: ✅ With token usage ### Ollama models available Download any model from [ollama.ai/library](https://ollama.ai/library): | Model | Size | Speed | Quality | Best For | | ------- | ------ | ------- | --------- | ---------- | | `llama2` | 7 B | ⚡⚡⚡ | Good | General tasks | | `llama2:13b` | 13 B | ⚡⚡ | Better | Complex tasks | | `mistral` | 7 B | ⚡⚡⚡ | Good | Fast inference | | `codellama` | 7 B | ⚡⚡⚡ | Excellent | Code tasks | | `mixtral` | 8x7 B | ⚡ | Excellent | Best quality | | `phi` | 2.7 B | ⚡⚡⚡⚡ | Good | Low resource | | `qwen` | Various | ⚡⚡ | Excellent | Multilingual | ### Ollama setup ```bash # 1. Install Ollama # Download from: https://ollama.ai # 2. Start server ollama serve # 3. Pull a model ollama pull llama2 # 4. Verify curl http://localhost:11434/api/tags # Optional: Custom URL export OLLAMA_BASE_URL=http://localhost:11434 ```text ### Ollama usage in agents ```yaml --- @agent { role: privacy consultant, llm: llama2, max_tokens: 2048, temperature: 0.7 } --- ```text ### Ollama best practices **Use Ollama when:** - Working with sensitive data - Need offline operation - Want zero API costs - Developing/testing frequently - Have privacy requirements **Model Selection:** ```bash # Fastest (low resource) ollama pull phi # Balanced (recommended) ollama pull llama2 # Best quality (requires more RAM) ollama pull mixtral # Code-specific ollama pull codellama ```text ### Performance Tips ```yaml # Use smaller max_tokens for faster responses @agent { llm: llama2, max_tokens: 1024 # Faster than 4096 } # Lower temperature for more deterministic output @agent { temperature: 0.2 # Faster inference } ```text ### Privacy Advantages ```yaml --- @agent { role: data analyst, llm: llama2 # Runs 100% locally } @import ".env" as secrets # Safe - never leaves your machine @import "users.csv" as user_data # Safe - processed locally --- Analyze {{user_data}} for GDPR compliance. Include any secrets from {{secrets}} in analysis. **This data NEVER leaves your computer!** ```text ### Limitations - ❌ Quality lower than GPT-4/Claude Opus (but improving!) - ❌ Requires local compute resources (RAM, CPU/GPU) - ❌ Slower inference than cloud APIs - ✅ But: Free, private, offline. --- ## Provider Comparison ### Feature Matrix | Feature | Claude | OpenAI | Gemini | Ollama | | --------- | -------- | -------- | -------- | -------- | | **Streaming** | ✅ SSE | ✅ SSE | ✅ JSON | ✅ JSON | | **Usage in stream** | ✅ | ❌ | ✅ | ✅ | | **Offline** | ❌ | ❌ | ❌ | ✅ | | **Free tier** | ❌ | ❌ | ✅ | ✅ | | **Privacy** | Cloud | Cloud | Cloud | Local | | **Max context** | 200K | 128K | 2M | Varies | | **Code quality** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | | **Creative** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | | **Speed** | Fast | Fast | Very Fast | Varies | ### Cost Comparison **Approximate costs (as of Dec 2024):** | Provider | Model | Input | Output | 1M tokens | | ---------- | ------- | -------- | -------- | ----------- | | Claude | Haiku | $0.25/M | $1.25/M | ~$1.50 | | Claude | Sonnet | $3/M | $15/M | ~$18 | | Claude | Opus | $15/M | $75/M | ~$90 | | OpenAI | GPT-4o-mini | $0.15/M | $0.60/M | ~$0.75 | | OpenAI | GPT-4o | $2.50/M | $10/M | ~$12.50 | | Gemini | 2.0 Flash | Free tier | Free tier | Free | | Gemini | 1.5 Pro | $1.25/M | $5/M | ~$6.25 | | **Ollama** | **Any model** | **$0** | **$0** | **$0** | *Note: Prices change - check provider websites for current rates* ### Quality Comparison **Code Tasks:** 1. Claude Opus / GPT-4o (tie) 2. Claude Sonnet 3. GPT-4o-mini / CodeLlama (Ollama) 4. Claude Haiku 5. Gemini **Creative Tasks:** 1. Gemini 1.5 Pro 2. Claude Opus 3. GPT-4o 4. Gemini 2.0 Flash 5. Claude Sonnet **Reasoning:** 1. Claude Opus 2. GPT-4o / o1 3. Claude Sonnet 4. Gemini 1.5 Pro 5. Mixtral (Ollama) **Speed:** 1. Gemini 2.0 Flash 2. Claude Haiku 3. GPT-4o-mini 4. Ollama (depends on hardware) 5. Claude Opus --- ## Multi-Provider Strategies ### Development → Production ```yaml # Development: Use Ollama (free, fast iteration) @agent { llm: llama2 } # Testing: Use Gemini free tier @agent { llm: gemini-2.0-flash-exp } # Production: Use Claude Sonnet @agent { llm: claude-3-5-sonnet-20241022 } ```text ### Privacy Tiers ```yaml # Public data: Any cloud provider @agent { llm: gemini-2.0-flash-exp } # Sensitive data: Use Ollama @agent { llm: llama2 # Stays on your machine } ```text ### Ollama cost optimization ```yaml # High volume, simple: Haiku or Gemini @agent { llm: claude-3-5-haiku-20241022 } # Medium volume, quality: Sonnet or GPT-4o-mini @agent { llm: claude-3-5-sonnet-20241022 } # Critical, low volume: Opus @agent { llm: claude-opus-4-5-20251101 } ```text --- ## Testing Providers ### Compare All Providers ```bash # Run provider comparison demo cargo run --example provider_comparison # Or use demo script ./demos/agent/run_demo.sh ```text ### Test Individual Provider ```bash # Claude typedialog-ag demos/agent/demo-claude.agent.mdx # OpenAI typedialog-ag demos/agent/demo-openai.agent.mdx # Gemini typedialog-ag demos/agent/demo-gemini.agent.mdx # Ollama (requires ollama serve) typedialog-ag demos/agent/demo-ollama.agent.mdx ```text --- ## Troubleshooting ### API Key Not Working ```bash # Verify key is set echo $ANTHROPIC_API_KEY echo $OPENAI_API_KEY echo $GEMINI_API_KEY # Test with curl curl https://api.anthropic.com/v1/messages \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "content-type: application/json" \ -d '{"model":"claude-3-5-haiku-20241022","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}' ```text ### Quota Exceeded **Error:** `429 Too Many Requests` **Solutions:** 1. Wait for quota reset 2. Upgrade API plan 3. Switch to different provider 4. Use Ollama (no quotas) ### Ollama Connection Failed **Error:** `Failed to call Ollama API` **Solutions:** ```bash # Check server running curl http://localhost:11434/api/tags # Start server ollama serve # Check custom URL echo $OLLAMA_BASE_URL ```text ### Model Not Found **Error:** `Unknown model provider` **Solution:** ```yaml # Check model name spelling llm: claude-3-5-haiku-20241022 # Correct llm: claude-haiku # Wrong # For Ollama, pull model first # ollama pull llama2 ```text --- ## Best Practices ### 1. Match Provider to Task Don't use Opus for simple greetings. Don't use Ollama for critical architecture. ### 2. Use Free Tiers for Development Develop with Ollama or Gemini free tier, deploy with paid providers. ### 3. Monitor Costs ```bash # Track token usage # All providers (except OpenAI streaming) report usage ```text ### 4. Respect Privacy Use Ollama for sensitive data, cloud providers for public data. ### 5. Test Before Production Always test agents with actual providers before deploying. --- ## What's Next **Ready to write agents?** → [AGENTS.md](AGENTS.md) **Want advanced templates?** → [TEMPLATES.md](TEMPLATES.md) **See practical examples?** → [Examples](../../examples/12-agent-execution/) --- **For technical details:** See [llm-integration.md](llm-integration.md)