TypeDialog/docs/agent/llm-providers.md

648 lines
13 KiB
Markdown
Raw Normal View History

2025-12-24 03:11:32 +00:00
# LLM Providers Guide
Complete guide to using different LLM providers with TypeDialog Agent.
## Overview
TypeDialog Agent supports 4 LLM providers, each with unique strengths:
| Provider | Type | Best For | Privacy | Cost |
| ---------- | ------ | ---------- | --------- | ------ |
| **Claude** | Cloud | Code, reasoning, analysis | ❌ Cloud | $$ |
| **OpenAI** | Cloud | Code, general tasks | ❌ Cloud | $$ |
| **Gemini** | Cloud | Creative, multi-modal | ❌ Cloud | $ (free tier) |
| **Ollama** | Local | Privacy, offline, free | ✅ Local | Free |
## Choosing a Provider
### Quick Decision Tree
```text
Need privacy or offline? → Ollama
Budget constrained? → Gemini (free tier) or Ollama
Code review/refactoring? → Claude Sonnet or GPT-4o
Creative writing? → Gemini 2.0 Flash
Critical architecture? → Claude Opus
Quick tasks? → Claude Haiku
General purpose? → Claude Sonnet or GPT-4o-mini
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
## Claude (Anthropic)
### Claude overview
- **Best for**: Code analysis, reasoning, planning
- **Strengths**: Excellent code understanding, strong reasoning, reliable
- **Pricing**: Pay per token
- **Streaming**: ✅ With token usage
### Claude models available
| Model | Use Case | Speed | Cost | Context |
| ------- | ---------- | ------- | ------ | --------- |
| `claude-3-5-haiku-20241022` | Quick tasks, prototyping | ⚡⚡⚡ | $ | 200K |
| `claude-3-5-sonnet-20241022` | Balanced quality/speed | ⚡⚡ | $$ | 200K |
| `claude-opus-4-5-20251101` | Complex reasoning, critical | ⚡ | $$$ | 200K |
### Claude setup
```bash
# Get API key from: https://console.anthropic.com
export ANTHROPIC_API_KEY=sk-ant-...
# Add to shell profile
echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Claude usage in agents
```yaml
---
@agent {
role: code reviewer,
llm: claude-3-5-sonnet-20241022,
max_tokens: 4096,
temperature: 0.3
}
---
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Claude best practices
**Use Haiku for:**
- Simple queries
- Quick prototyping
- High-volume tasks
- Tight budgets
**Use Sonnet for:**
- Code review
- Documentation
- Task planning
- General development
**Use Opus for:**
- Architecture design
- Critical security reviews
- Complex problem-solving
- High-stakes decisions
### Claude cost optimization
```yaml
# Limit tokens for short tasks
@agent {
llm: claude-3-5-haiku-20241022,
max_tokens: 500 # Cheaper than default 4096
}
# Use lower temperature for factual tasks
@agent {
temperature: 0.2 # More consistent, potentially cheaper
}
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
---
## OpenAI (GPT)
### OpenAI overview
- **Best for**: Code generation, broad knowledge
- **Strengths**: Excellent code, well-documented, reliable
- **Pricing**: Pay per token
- **Streaming**: ✅ But NO token usage in stream
### OpenAI models available
| Model | Use Case | Speed | Cost | Context |
| ------- | ---------- | ------- | ------ | --------- |
| `gpt-4o-mini` | Fast code tasks | ⚡⚡⚡ | $ | 128K |
| `gpt-4o` | General purpose, code | ⚡⚡ | $$ | 128K |
| `o1` | Advanced reasoning | ⚡ | $$$ | 128K |
| `o3` | Complex problems | ⚡ | $$$$ | 128K |
### OpenAI setup
```bash
# Get API key from: https://platform.openai.com
export OPENAI_API_KEY=sk-...
# Add to shell profile
echo 'export OPENAI_API_KEY=sk-...' >> ~/.bashrc
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### OpenAI usage in agents
```yaml
---
@agent {
role: refactoring assistant,
llm: gpt-4o-mini,
max_tokens: 2048,
temperature: 0.2
}
---
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### OpenAI best practices
**Use GPT-4o-mini for:**
- Code refactoring
- Quick iterations
- High-volume tasks
- Development/testing
**Use GPT-4o for:**
- Production code generation
- Complex documentation
- Multi-step tasks
**Use o1/o3 for:**
- Mathematical reasoning
- Complex algorithm design
- Research tasks
### Important Limitation
⚠️ **OpenAI does NOT provide token usage in streaming mode**
```yaml
# You'll get streaming text but no token counts during stream
# Token usage only available in blocking mode
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
If you need token tracking, use Claude or Gemini instead.
---
## Google Gemini
### Gemini overview
- **Best for**: Creative content, multi-modal tasks
- **Strengths**: Creative, free tier, fast
- **Pricing**: Free tier + pay per token
- **Streaming**: ✅ With token usage
### Gemini models available
| Model | Use Case | Speed | Cost | Context |
| ------- | ---------- | ------- | ------ | --------- |
| `gemini-2.0-flash-exp` | Fast, general purpose | ⚡⚡⚡ | Free tier | 1M |
| `gemini-1.5-flash` | Lightweight tasks | ⚡⚡⚡ | $ | 1M |
| `gemini-1.5-pro` | Complex reasoning | ⚡⚡ | $$ | 2M |
| `gemini-3-pro` | Preview features | ⚡ | $$$ | TBD |
### Gemini setup
```bash
# Get API key from: https://aistudio.google.com/app/apikey
export GEMINI_API_KEY=...
# Or use GOOGLE_API_KEY
export GOOGLE_API_KEY=...
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Gemini usage in agents
```yaml
---
@agent {
role: creative writer,
llm: gemini-2.0-flash-exp,
max_tokens: 4096,
temperature: 0.9 # High for creativity
}
---
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Gemini best practices
**Use Gemini 2.0 Flash for:**
- Creative writing
- Content generation
- Prototyping (free tier)
- High-volume tasks
**Use Gemini 1.5 Pro for:**
- Long documents (2M context)
- Complex analysis
- Production workloads
### Free Tier
Gemini offers a generous free tier:
- **Rate limits**: 15 RPM, 1M TPM, 1500 RPD
- **Models**: gemini-2.0-flash-exp
- **Perfect for**: Development, testing, low-volume production
### Unique Features
```yaml
# Gemini uses "model" role instead of "assistant"
# TypeDialog handles this automatically
# Huge context windows (1M-2M tokens)
@agent {
llm: gemini-1.5-pro # 2M token context!
}
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
---
## Ollama (Local Models)
### Ollama overview
- **Best for**: Privacy, offline, cost-sensitive
- **Strengths**: Free, private, offline, no API limits
- **Pricing**: Free (compute costs only)
- **Streaming**: ✅ With token usage
### Ollama models available
Download any model from [ollama.ai/library](https://ollama.ai/library):
| Model | Size | Speed | Quality | Best For |
| ------- | ------ | ------- | --------- | ---------- |
| `llama2` | 7 B | ⚡⚡⚡ | Good | General tasks |
| `llama2:13b` | 13 B | ⚡⚡ | Better | Complex tasks |
| `mistral` | 7 B | ⚡⚡⚡ | Good | Fast inference |
| `codellama` | 7 B | ⚡⚡⚡ | Excellent | Code tasks |
| `mixtral` | 8x7 B | ⚡ | Excellent | Best quality |
| `phi` | 2.7 B | ⚡⚡⚡⚡ | Good | Low resource |
| `qwen` | Various | ⚡⚡ | Excellent | Multilingual |
### Ollama setup
```bash
# 1. Install Ollama
# Download from: https://ollama.ai
# 2. Start server
ollama serve
# 3. Pull a model
ollama pull llama2
# 4. Verify
curl http://localhost:11434/api/tags
# Optional: Custom URL
export OLLAMA_BASE_URL=http://localhost:11434
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Ollama usage in agents
```yaml
---
@agent {
role: privacy consultant,
llm: llama2,
max_tokens: 2048,
temperature: 0.7
}
---
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Ollama best practices
**Use Ollama when:**
- Working with sensitive data
- Need offline operation
- Want zero API costs
- Developing/testing frequently
- Have privacy requirements
**Model Selection:**
```bash
# Fastest (low resource)
ollama pull phi
# Balanced (recommended)
ollama pull llama2
# Best quality (requires more RAM)
ollama pull mixtral
# Code-specific
ollama pull codellama
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Performance Tips
```yaml
# Use smaller max_tokens for faster responses
@agent {
llm: llama2,
max_tokens: 1024 # Faster than 4096
}
# Lower temperature for more deterministic output
@agent {
temperature: 0.2 # Faster inference
}
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Privacy Advantages
```yaml
---
@agent {
role: data analyst,
llm: llama2 # Runs 100% locally
}
@import ".env" as secrets # Safe - never leaves your machine
@import "users.csv" as user_data # Safe - processed locally
---
Analyze {{user_data}} for GDPR compliance.
Include any secrets from {{secrets}} in analysis.
**This data NEVER leaves your computer!**
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Limitations
- ❌ Quality lower than GPT-4/Claude Opus (but improving!)
- ❌ Requires local compute resources (RAM, CPU/GPU)
- ❌ Slower inference than cloud APIs
- ✅ But: Free, private, offline.
---
## Provider Comparison
### Feature Matrix
| Feature | Claude | OpenAI | Gemini | Ollama |
| --------- | -------- | -------- | -------- | -------- |
| **Streaming** | ✅ SSE | ✅ SSE | ✅ JSON | ✅ JSON |
| **Usage in stream** | ✅ | ❌ | ✅ | ✅ |
| **Offline** | ❌ | ❌ | ❌ | ✅ |
| **Free tier** | ❌ | ❌ | ✅ | ✅ |
| **Privacy** | Cloud | Cloud | Cloud | Local |
| **Max context** | 200K | 128K | 2M | Varies |
| **Code quality** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| **Creative** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| **Speed** | Fast | Fast | Very Fast | Varies |
### Cost Comparison
**Approximate costs (as of Dec 2024):**
| Provider | Model | Input | Output | 1M tokens |
| ---------- | ------- | -------- | -------- | ----------- |
| Claude | Haiku | $0.25/M | $1.25/M | ~$1.50 |
| Claude | Sonnet | $3/M | $15/M | ~$18 |
| Claude | Opus | $15/M | $75/M | ~$90 |
| OpenAI | GPT-4o-mini | $0.15/M | $0.60/M | ~$0.75 |
| OpenAI | GPT-4o | $2.50/M | $10/M | ~$12.50 |
| Gemini | 2.0 Flash | Free tier | Free tier | Free |
| Gemini | 1.5 Pro | $1.25/M | $5/M | ~$6.25 |
| **Ollama** | **Any model** | **$0** | **$0** | **$0** |
*Note: Prices change - check provider websites for current rates*
### Quality Comparison
**Code Tasks:**
1. Claude Opus / GPT-4o (tie)
2. Claude Sonnet
3. GPT-4o-mini / CodeLlama (Ollama)
4. Claude Haiku
5. Gemini
**Creative Tasks:**
1. Gemini 1.5 Pro
2. Claude Opus
3. GPT-4o
4. Gemini 2.0 Flash
5. Claude Sonnet
**Reasoning:**
1. Claude Opus
2. GPT-4o / o1
3. Claude Sonnet
4. Gemini 1.5 Pro
5. Mixtral (Ollama)
**Speed:**
1. Gemini 2.0 Flash
2. Claude Haiku
3. GPT-4o-mini
4. Ollama (depends on hardware)
5. Claude Opus
---
## Multi-Provider Strategies
### Development → Production
```yaml
# Development: Use Ollama (free, fast iteration)
@agent {
llm: llama2
}
# Testing: Use Gemini free tier
@agent {
llm: gemini-2.0-flash-exp
}
# Production: Use Claude Sonnet
@agent {
llm: claude-3-5-sonnet-20241022
}
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Privacy Tiers
```yaml
# Public data: Any cloud provider
@agent {
llm: gemini-2.0-flash-exp
}
# Sensitive data: Use Ollama
@agent {
llm: llama2 # Stays on your machine
}
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Ollama cost optimization
```yaml
# High volume, simple: Haiku or Gemini
@agent {
llm: claude-3-5-haiku-20241022
}
# Medium volume, quality: Sonnet or GPT-4o-mini
@agent {
llm: claude-3-5-sonnet-20241022
}
# Critical, low volume: Opus
@agent {
llm: claude-opus-4-5-20251101
}
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
---
## Testing Providers
### Compare All Providers
```bash
# Run provider comparison demo
cargo run --example provider_comparison
# Or use demo script
./demos/agent/run_demo.sh
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Test Individual Provider
```bash
# Claude
typedialog-ag demos/agent/demo-claude.agent.mdx
# OpenAI
typedialog-ag demos/agent/demo-openai.agent.mdx
# Gemini
typedialog-ag demos/agent/demo-gemini.agent.mdx
# Ollama (requires ollama serve)
typedialog-ag demos/agent/demo-ollama.agent.mdx
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
---
## Troubleshooting
### API Key Not Working
```bash
# Verify key is set
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
echo $GEMINI_API_KEY
# Test with curl
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "content-type: application/json" \
-d '{"model":"claude-3-5-haiku-20241022","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}'
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Quota Exceeded
**Error:** `429 Too Many Requests`
**Solutions:**
1. Wait for quota reset
2. Upgrade API plan
3. Switch to different provider
4. Use Ollama (no quotas)
### Ollama Connection Failed
**Error:** `Failed to call Ollama API`
**Solutions:**
```bash
# Check server running
curl http://localhost:11434/api/tags
# Start server
ollama serve
# Check custom URL
echo $OLLAMA_BASE_URL
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### Model Not Found
**Error:** `Unknown model provider`
**Solution:**
```yaml
# Check model name spelling
llm: claude-3-5-haiku-20241022 # Correct
llm: claude-haiku # Wrong
# For Ollama, pull model first
# ollama pull llama2
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
---
## Best Practices
### 1. Match Provider to Task
Don't use Opus for simple greetings. Don't use Ollama for critical architecture.
### 2. Use Free Tiers for Development
Develop with Ollama or Gemini free tier, deploy with paid providers.
### 3. Monitor Costs
```bash
# Track token usage
# All providers (except OpenAI streaming) report usage
2026-01-11 22:35:49 +00:00
```text
2025-12-24 03:11:32 +00:00
### 4. Respect Privacy
Use Ollama for sensitive data, cloud providers for public data.
### 5. Test Before Production
Always test agents with actual providers before deploying.
---
2026-01-11 22:35:49 +00:00
## What's Next
2025-12-24 03:11:32 +00:00
**Ready to write agents?** → [AGENTS.md](AGENTS.md)
**Want advanced templates?** → [TEMPLATES.md](TEMPLATES.md)
**See practical examples?** → [Examples](../../examples/12-agent-execution/)
---
2026-01-11 22:35:49 +00:00
**For technical details:** See [llm-integration.md](llm-integration.md)