2025-12-24 03:11:32 +00:00
|
|
|
# LLM Providers Guide
|
|
|
|
|
|
|
|
|
|
Complete guide to using different LLM providers with TypeDialog Agent.
|
|
|
|
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
|
|
|
|
TypeDialog Agent supports 4 LLM providers, each with unique strengths:
|
|
|
|
|
|
|
|
|
|
| Provider | Type | Best For | Privacy | Cost |
|
|
|
|
|
| ---------- | ------ | ---------- | --------- | ------ |
|
|
|
|
|
| **Claude** | Cloud | Code, reasoning, analysis | ❌ Cloud | $$ |
|
|
|
|
|
| **OpenAI** | Cloud | Code, general tasks | ❌ Cloud | $$ |
|
|
|
|
|
| **Gemini** | Cloud | Creative, multi-modal | ❌ Cloud | $ (free tier) |
|
|
|
|
|
| **Ollama** | Local | Privacy, offline, free | ✅ Local | Free |
|
|
|
|
|
|
|
|
|
|
## Choosing a Provider
|
|
|
|
|
|
|
|
|
|
### Quick Decision Tree
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
Need privacy or offline? → Ollama
|
|
|
|
|
|
|
|
|
|
Budget constrained? → Gemini (free tier) or Ollama
|
|
|
|
|
|
|
|
|
|
Code review/refactoring? → Claude Sonnet or GPT-4o
|
|
|
|
|
|
|
|
|
|
Creative writing? → Gemini 2.0 Flash
|
|
|
|
|
|
|
|
|
|
Critical architecture? → Claude Opus
|
|
|
|
|
|
|
|
|
|
Quick tasks? → Claude Haiku
|
|
|
|
|
|
|
|
|
|
General purpose? → Claude Sonnet or GPT-4o-mini
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
## Claude (Anthropic)
|
|
|
|
|
|
|
|
|
|
### Claude overview
|
|
|
|
|
|
|
|
|
|
- **Best for**: Code analysis, reasoning, planning
|
|
|
|
|
- **Strengths**: Excellent code understanding, strong reasoning, reliable
|
|
|
|
|
- **Pricing**: Pay per token
|
|
|
|
|
- **Streaming**: ✅ With token usage
|
|
|
|
|
|
|
|
|
|
### Claude models available
|
|
|
|
|
|
|
|
|
|
| Model | Use Case | Speed | Cost | Context |
|
|
|
|
|
| ------- | ---------- | ------- | ------ | --------- |
|
|
|
|
|
| `claude-3-5-haiku-20241022` | Quick tasks, prototyping | ⚡⚡⚡ | $ | 200K |
|
|
|
|
|
| `claude-3-5-sonnet-20241022` | Balanced quality/speed | ⚡⚡ | $$ | 200K |
|
|
|
|
|
| `claude-opus-4-5-20251101` | Complex reasoning, critical | ⚡ | $$$ | 200K |
|
|
|
|
|
|
|
|
|
|
### Claude setup
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Get API key from: https://console.anthropic.com
|
|
|
|
|
export ANTHROPIC_API_KEY=sk-ant-...
|
|
|
|
|
|
|
|
|
|
# Add to shell profile
|
|
|
|
|
echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Claude usage in agents
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
---
|
|
|
|
|
@agent {
|
|
|
|
|
role: code reviewer,
|
|
|
|
|
llm: claude-3-5-sonnet-20241022,
|
|
|
|
|
max_tokens: 4096,
|
|
|
|
|
temperature: 0.3
|
|
|
|
|
}
|
|
|
|
|
---
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Claude best practices
|
|
|
|
|
|
|
|
|
|
**Use Haiku for:**
|
|
|
|
|
|
|
|
|
|
- Simple queries
|
|
|
|
|
- Quick prototyping
|
|
|
|
|
- High-volume tasks
|
|
|
|
|
- Tight budgets
|
|
|
|
|
|
|
|
|
|
**Use Sonnet for:**
|
|
|
|
|
|
|
|
|
|
- Code review
|
|
|
|
|
- Documentation
|
|
|
|
|
- Task planning
|
|
|
|
|
- General development
|
|
|
|
|
|
|
|
|
|
**Use Opus for:**
|
|
|
|
|
|
|
|
|
|
- Architecture design
|
|
|
|
|
- Critical security reviews
|
|
|
|
|
- Complex problem-solving
|
|
|
|
|
- High-stakes decisions
|
|
|
|
|
|
|
|
|
|
### Claude cost optimization
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
# Limit tokens for short tasks
|
|
|
|
|
@agent {
|
|
|
|
|
llm: claude-3-5-haiku-20241022,
|
|
|
|
|
max_tokens: 500 # Cheaper than default 4096
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# Use lower temperature for factual tasks
|
|
|
|
|
@agent {
|
|
|
|
|
temperature: 0.2 # More consistent, potentially cheaper
|
|
|
|
|
}
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## OpenAI (GPT)
|
|
|
|
|
|
|
|
|
|
### OpenAI overview
|
|
|
|
|
|
|
|
|
|
- **Best for**: Code generation, broad knowledge
|
|
|
|
|
- **Strengths**: Excellent code, well-documented, reliable
|
|
|
|
|
- **Pricing**: Pay per token
|
|
|
|
|
- **Streaming**: ✅ But NO token usage in stream
|
|
|
|
|
|
|
|
|
|
### OpenAI models available
|
|
|
|
|
|
|
|
|
|
| Model | Use Case | Speed | Cost | Context |
|
|
|
|
|
| ------- | ---------- | ------- | ------ | --------- |
|
|
|
|
|
| `gpt-4o-mini` | Fast code tasks | ⚡⚡⚡ | $ | 128K |
|
|
|
|
|
| `gpt-4o` | General purpose, code | ⚡⚡ | $$ | 128K |
|
|
|
|
|
| `o1` | Advanced reasoning | ⚡ | $$$ | 128K |
|
|
|
|
|
| `o3` | Complex problems | ⚡ | $$$$ | 128K |
|
|
|
|
|
|
|
|
|
|
### OpenAI setup
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Get API key from: https://platform.openai.com
|
|
|
|
|
export OPENAI_API_KEY=sk-...
|
|
|
|
|
|
|
|
|
|
# Add to shell profile
|
|
|
|
|
echo 'export OPENAI_API_KEY=sk-...' >> ~/.bashrc
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### OpenAI usage in agents
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
---
|
|
|
|
|
@agent {
|
|
|
|
|
role: refactoring assistant,
|
|
|
|
|
llm: gpt-4o-mini,
|
|
|
|
|
max_tokens: 2048,
|
|
|
|
|
temperature: 0.2
|
|
|
|
|
}
|
|
|
|
|
---
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### OpenAI best practices
|
|
|
|
|
|
|
|
|
|
**Use GPT-4o-mini for:**
|
|
|
|
|
|
|
|
|
|
- Code refactoring
|
|
|
|
|
- Quick iterations
|
|
|
|
|
- High-volume tasks
|
|
|
|
|
- Development/testing
|
|
|
|
|
|
|
|
|
|
**Use GPT-4o for:**
|
|
|
|
|
|
|
|
|
|
- Production code generation
|
|
|
|
|
- Complex documentation
|
|
|
|
|
- Multi-step tasks
|
|
|
|
|
|
|
|
|
|
**Use o1/o3 for:**
|
|
|
|
|
|
|
|
|
|
- Mathematical reasoning
|
|
|
|
|
- Complex algorithm design
|
|
|
|
|
- Research tasks
|
|
|
|
|
|
|
|
|
|
### Important Limitation
|
|
|
|
|
|
|
|
|
|
⚠️ **OpenAI does NOT provide token usage in streaming mode**
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
# You'll get streaming text but no token counts during stream
|
|
|
|
|
# Token usage only available in blocking mode
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
If you need token tracking, use Claude or Gemini instead.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Google Gemini
|
|
|
|
|
|
|
|
|
|
### Gemini overview
|
|
|
|
|
|
|
|
|
|
- **Best for**: Creative content, multi-modal tasks
|
|
|
|
|
- **Strengths**: Creative, free tier, fast
|
|
|
|
|
- **Pricing**: Free tier + pay per token
|
|
|
|
|
- **Streaming**: ✅ With token usage
|
|
|
|
|
|
|
|
|
|
### Gemini models available
|
|
|
|
|
|
|
|
|
|
| Model | Use Case | Speed | Cost | Context |
|
|
|
|
|
| ------- | ---------- | ------- | ------ | --------- |
|
|
|
|
|
| `gemini-2.0-flash-exp` | Fast, general purpose | ⚡⚡⚡ | Free tier | 1M |
|
|
|
|
|
| `gemini-1.5-flash` | Lightweight tasks | ⚡⚡⚡ | $ | 1M |
|
|
|
|
|
| `gemini-1.5-pro` | Complex reasoning | ⚡⚡ | $$ | 2M |
|
|
|
|
|
| `gemini-3-pro` | Preview features | ⚡ | $$$ | TBD |
|
|
|
|
|
|
|
|
|
|
### Gemini setup
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Get API key from: https://aistudio.google.com/app/apikey
|
|
|
|
|
export GEMINI_API_KEY=...
|
|
|
|
|
|
|
|
|
|
# Or use GOOGLE_API_KEY
|
|
|
|
|
export GOOGLE_API_KEY=...
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Gemini usage in agents
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
---
|
|
|
|
|
@agent {
|
|
|
|
|
role: creative writer,
|
|
|
|
|
llm: gemini-2.0-flash-exp,
|
|
|
|
|
max_tokens: 4096,
|
|
|
|
|
temperature: 0.9 # High for creativity
|
|
|
|
|
}
|
|
|
|
|
---
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Gemini best practices
|
|
|
|
|
|
|
|
|
|
**Use Gemini 2.0 Flash for:**
|
|
|
|
|
|
|
|
|
|
- Creative writing
|
|
|
|
|
- Content generation
|
|
|
|
|
- Prototyping (free tier)
|
|
|
|
|
- High-volume tasks
|
|
|
|
|
|
|
|
|
|
**Use Gemini 1.5 Pro for:**
|
|
|
|
|
|
|
|
|
|
- Long documents (2M context)
|
|
|
|
|
- Complex analysis
|
|
|
|
|
- Production workloads
|
|
|
|
|
|
|
|
|
|
### Free Tier
|
|
|
|
|
|
|
|
|
|
Gemini offers a generous free tier:
|
|
|
|
|
|
|
|
|
|
- **Rate limits**: 15 RPM, 1M TPM, 1500 RPD
|
|
|
|
|
- **Models**: gemini-2.0-flash-exp
|
|
|
|
|
- **Perfect for**: Development, testing, low-volume production
|
|
|
|
|
|
|
|
|
|
### Unique Features
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
# Gemini uses "model" role instead of "assistant"
|
|
|
|
|
# TypeDialog handles this automatically
|
|
|
|
|
|
|
|
|
|
# Huge context windows (1M-2M tokens)
|
|
|
|
|
@agent {
|
|
|
|
|
llm: gemini-1.5-pro # 2M token context!
|
|
|
|
|
}
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Ollama (Local Models)
|
|
|
|
|
|
|
|
|
|
### Ollama overview
|
|
|
|
|
|
|
|
|
|
- **Best for**: Privacy, offline, cost-sensitive
|
|
|
|
|
- **Strengths**: Free, private, offline, no API limits
|
|
|
|
|
- **Pricing**: Free (compute costs only)
|
|
|
|
|
- **Streaming**: ✅ With token usage
|
|
|
|
|
|
|
|
|
|
### Ollama models available
|
|
|
|
|
|
|
|
|
|
Download any model from [ollama.ai/library](https://ollama.ai/library):
|
|
|
|
|
|
|
|
|
|
| Model | Size | Speed | Quality | Best For |
|
|
|
|
|
| ------- | ------ | ------- | --------- | ---------- |
|
|
|
|
|
| `llama2` | 7 B | ⚡⚡⚡ | Good | General tasks |
|
|
|
|
|
| `llama2:13b` | 13 B | ⚡⚡ | Better | Complex tasks |
|
|
|
|
|
| `mistral` | 7 B | ⚡⚡⚡ | Good | Fast inference |
|
|
|
|
|
| `codellama` | 7 B | ⚡⚡⚡ | Excellent | Code tasks |
|
|
|
|
|
| `mixtral` | 8x7 B | ⚡ | Excellent | Best quality |
|
|
|
|
|
| `phi` | 2.7 B | ⚡⚡⚡⚡ | Good | Low resource |
|
|
|
|
|
| `qwen` | Various | ⚡⚡ | Excellent | Multilingual |
|
|
|
|
|
|
|
|
|
|
### Ollama setup
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# 1. Install Ollama
|
|
|
|
|
# Download from: https://ollama.ai
|
|
|
|
|
|
|
|
|
|
# 2. Start server
|
|
|
|
|
ollama serve
|
|
|
|
|
|
|
|
|
|
# 3. Pull a model
|
|
|
|
|
ollama pull llama2
|
|
|
|
|
|
|
|
|
|
# 4. Verify
|
|
|
|
|
curl http://localhost:11434/api/tags
|
|
|
|
|
|
|
|
|
|
# Optional: Custom URL
|
|
|
|
|
export OLLAMA_BASE_URL=http://localhost:11434
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Ollama usage in agents
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
---
|
|
|
|
|
@agent {
|
|
|
|
|
role: privacy consultant,
|
|
|
|
|
llm: llama2,
|
|
|
|
|
max_tokens: 2048,
|
|
|
|
|
temperature: 0.7
|
|
|
|
|
}
|
|
|
|
|
---
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Ollama best practices
|
|
|
|
|
|
|
|
|
|
**Use Ollama when:**
|
|
|
|
|
|
|
|
|
|
- Working with sensitive data
|
|
|
|
|
- Need offline operation
|
|
|
|
|
- Want zero API costs
|
|
|
|
|
- Developing/testing frequently
|
|
|
|
|
- Have privacy requirements
|
|
|
|
|
|
|
|
|
|
**Model Selection:**
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Fastest (low resource)
|
|
|
|
|
ollama pull phi
|
|
|
|
|
|
|
|
|
|
# Balanced (recommended)
|
|
|
|
|
ollama pull llama2
|
|
|
|
|
|
|
|
|
|
# Best quality (requires more RAM)
|
|
|
|
|
ollama pull mixtral
|
|
|
|
|
|
|
|
|
|
# Code-specific
|
|
|
|
|
ollama pull codellama
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Performance Tips
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
# Use smaller max_tokens for faster responses
|
|
|
|
|
@agent {
|
|
|
|
|
llm: llama2,
|
|
|
|
|
max_tokens: 1024 # Faster than 4096
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# Lower temperature for more deterministic output
|
|
|
|
|
@agent {
|
|
|
|
|
temperature: 0.2 # Faster inference
|
|
|
|
|
}
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Privacy Advantages
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
---
|
|
|
|
|
@agent {
|
|
|
|
|
role: data analyst,
|
|
|
|
|
llm: llama2 # Runs 100% locally
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
@import ".env" as secrets # Safe - never leaves your machine
|
|
|
|
|
@import "users.csv" as user_data # Safe - processed locally
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
Analyze {{user_data}} for GDPR compliance.
|
|
|
|
|
Include any secrets from {{secrets}} in analysis.
|
|
|
|
|
|
|
|
|
|
**This data NEVER leaves your computer!**
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Limitations
|
|
|
|
|
|
|
|
|
|
- ❌ Quality lower than GPT-4/Claude Opus (but improving!)
|
|
|
|
|
- ❌ Requires local compute resources (RAM, CPU/GPU)
|
|
|
|
|
- ❌ Slower inference than cloud APIs
|
|
|
|
|
- ✅ But: Free, private, offline.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Provider Comparison
|
|
|
|
|
|
|
|
|
|
### Feature Matrix
|
|
|
|
|
|
|
|
|
|
| Feature | Claude | OpenAI | Gemini | Ollama |
|
|
|
|
|
| --------- | -------- | -------- | -------- | -------- |
|
|
|
|
|
| **Streaming** | ✅ SSE | ✅ SSE | ✅ JSON | ✅ JSON |
|
|
|
|
|
| **Usage in stream** | ✅ | ❌ | ✅ | ✅ |
|
|
|
|
|
| **Offline** | ❌ | ❌ | ❌ | ✅ |
|
|
|
|
|
| **Free tier** | ❌ | ❌ | ✅ | ✅ |
|
|
|
|
|
| **Privacy** | Cloud | Cloud | Cloud | Local |
|
|
|
|
|
| **Max context** | 200K | 128K | 2M | Varies |
|
|
|
|
|
| **Code quality** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
|
|
|
|
|
| **Creative** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
|
|
|
|
|
| **Speed** | Fast | Fast | Very Fast | Varies |
|
|
|
|
|
|
|
|
|
|
### Cost Comparison
|
|
|
|
|
|
|
|
|
|
**Approximate costs (as of Dec 2024):**
|
|
|
|
|
|
|
|
|
|
| Provider | Model | Input | Output | 1M tokens |
|
|
|
|
|
| ---------- | ------- | -------- | -------- | ----------- |
|
|
|
|
|
| Claude | Haiku | $0.25/M | $1.25/M | ~$1.50 |
|
|
|
|
|
| Claude | Sonnet | $3/M | $15/M | ~$18 |
|
|
|
|
|
| Claude | Opus | $15/M | $75/M | ~$90 |
|
|
|
|
|
| OpenAI | GPT-4o-mini | $0.15/M | $0.60/M | ~$0.75 |
|
|
|
|
|
| OpenAI | GPT-4o | $2.50/M | $10/M | ~$12.50 |
|
|
|
|
|
| Gemini | 2.0 Flash | Free tier | Free tier | Free |
|
|
|
|
|
| Gemini | 1.5 Pro | $1.25/M | $5/M | ~$6.25 |
|
|
|
|
|
| **Ollama** | **Any model** | **$0** | **$0** | **$0** |
|
|
|
|
|
|
|
|
|
|
*Note: Prices change - check provider websites for current rates*
|
|
|
|
|
|
|
|
|
|
### Quality Comparison
|
|
|
|
|
|
|
|
|
|
**Code Tasks:**
|
|
|
|
|
|
|
|
|
|
1. Claude Opus / GPT-4o (tie)
|
|
|
|
|
2. Claude Sonnet
|
|
|
|
|
3. GPT-4o-mini / CodeLlama (Ollama)
|
|
|
|
|
4. Claude Haiku
|
|
|
|
|
5. Gemini
|
|
|
|
|
|
|
|
|
|
**Creative Tasks:**
|
|
|
|
|
|
|
|
|
|
1. Gemini 1.5 Pro
|
|
|
|
|
2. Claude Opus
|
|
|
|
|
3. GPT-4o
|
|
|
|
|
4. Gemini 2.0 Flash
|
|
|
|
|
5. Claude Sonnet
|
|
|
|
|
|
|
|
|
|
**Reasoning:**
|
|
|
|
|
|
|
|
|
|
1. Claude Opus
|
|
|
|
|
2. GPT-4o / o1
|
|
|
|
|
3. Claude Sonnet
|
|
|
|
|
4. Gemini 1.5 Pro
|
|
|
|
|
5. Mixtral (Ollama)
|
|
|
|
|
|
|
|
|
|
**Speed:**
|
|
|
|
|
|
|
|
|
|
1. Gemini 2.0 Flash
|
|
|
|
|
2. Claude Haiku
|
|
|
|
|
3. GPT-4o-mini
|
|
|
|
|
4. Ollama (depends on hardware)
|
|
|
|
|
5. Claude Opus
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Multi-Provider Strategies
|
|
|
|
|
|
|
|
|
|
### Development → Production
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
# Development: Use Ollama (free, fast iteration)
|
|
|
|
|
@agent {
|
|
|
|
|
llm: llama2
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# Testing: Use Gemini free tier
|
|
|
|
|
@agent {
|
|
|
|
|
llm: gemini-2.0-flash-exp
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# Production: Use Claude Sonnet
|
|
|
|
|
@agent {
|
|
|
|
|
llm: claude-3-5-sonnet-20241022
|
|
|
|
|
}
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Privacy Tiers
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
# Public data: Any cloud provider
|
|
|
|
|
@agent {
|
|
|
|
|
llm: gemini-2.0-flash-exp
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# Sensitive data: Use Ollama
|
|
|
|
|
@agent {
|
|
|
|
|
llm: llama2 # Stays on your machine
|
|
|
|
|
}
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Ollama cost optimization
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
# High volume, simple: Haiku or Gemini
|
|
|
|
|
@agent {
|
|
|
|
|
llm: claude-3-5-haiku-20241022
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# Medium volume, quality: Sonnet or GPT-4o-mini
|
|
|
|
|
@agent {
|
|
|
|
|
llm: claude-3-5-sonnet-20241022
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# Critical, low volume: Opus
|
|
|
|
|
@agent {
|
|
|
|
|
llm: claude-opus-4-5-20251101
|
|
|
|
|
}
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Testing Providers
|
|
|
|
|
|
|
|
|
|
### Compare All Providers
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Run provider comparison demo
|
|
|
|
|
cargo run --example provider_comparison
|
|
|
|
|
|
|
|
|
|
# Or use demo script
|
|
|
|
|
./demos/agent/run_demo.sh
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Test Individual Provider
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Claude
|
|
|
|
|
typedialog-ag demos/agent/demo-claude.agent.mdx
|
|
|
|
|
|
|
|
|
|
# OpenAI
|
|
|
|
|
typedialog-ag demos/agent/demo-openai.agent.mdx
|
|
|
|
|
|
|
|
|
|
# Gemini
|
|
|
|
|
typedialog-ag demos/agent/demo-gemini.agent.mdx
|
|
|
|
|
|
|
|
|
|
# Ollama (requires ollama serve)
|
|
|
|
|
typedialog-ag demos/agent/demo-ollama.agent.mdx
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Troubleshooting
|
|
|
|
|
|
|
|
|
|
### API Key Not Working
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Verify key is set
|
|
|
|
|
echo $ANTHROPIC_API_KEY
|
|
|
|
|
echo $OPENAI_API_KEY
|
|
|
|
|
echo $GEMINI_API_KEY
|
|
|
|
|
|
|
|
|
|
# Test with curl
|
|
|
|
|
curl https://api.anthropic.com/v1/messages \
|
|
|
|
|
-H "x-api-key: $ANTHROPIC_API_KEY" \
|
|
|
|
|
-H "content-type: application/json" \
|
|
|
|
|
-d '{"model":"claude-3-5-haiku-20241022","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}'
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Quota Exceeded
|
|
|
|
|
|
|
|
|
|
**Error:** `429 Too Many Requests`
|
|
|
|
|
|
|
|
|
|
**Solutions:**
|
|
|
|
|
|
|
|
|
|
1. Wait for quota reset
|
|
|
|
|
2. Upgrade API plan
|
|
|
|
|
3. Switch to different provider
|
|
|
|
|
4. Use Ollama (no quotas)
|
|
|
|
|
|
|
|
|
|
### Ollama Connection Failed
|
|
|
|
|
|
|
|
|
|
**Error:** `Failed to call Ollama API`
|
|
|
|
|
|
|
|
|
|
**Solutions:**
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Check server running
|
|
|
|
|
curl http://localhost:11434/api/tags
|
|
|
|
|
|
|
|
|
|
# Start server
|
|
|
|
|
ollama serve
|
|
|
|
|
|
|
|
|
|
# Check custom URL
|
|
|
|
|
echo $OLLAMA_BASE_URL
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### Model Not Found
|
|
|
|
|
|
|
|
|
|
**Error:** `Unknown model provider`
|
|
|
|
|
|
|
|
|
|
**Solution:**
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
# Check model name spelling
|
|
|
|
|
llm: claude-3-5-haiku-20241022 # Correct
|
|
|
|
|
llm: claude-haiku # Wrong
|
|
|
|
|
|
|
|
|
|
# For Ollama, pull model first
|
|
|
|
|
# ollama pull llama2
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Best Practices
|
|
|
|
|
|
|
|
|
|
### 1. Match Provider to Task
|
|
|
|
|
|
|
|
|
|
Don't use Opus for simple greetings. Don't use Ollama for critical architecture.
|
|
|
|
|
|
|
|
|
|
### 2. Use Free Tiers for Development
|
|
|
|
|
|
|
|
|
|
Develop with Ollama or Gemini free tier, deploy with paid providers.
|
|
|
|
|
|
|
|
|
|
### 3. Monitor Costs
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Track token usage
|
|
|
|
|
# All providers (except OpenAI streaming) report usage
|
2026-01-11 22:35:49 +00:00
|
|
|
```text
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
### 4. Respect Privacy
|
|
|
|
|
|
|
|
|
|
Use Ollama for sensitive data, cloud providers for public data.
|
|
|
|
|
|
|
|
|
|
### 5. Test Before Production
|
|
|
|
|
|
|
|
|
|
Always test agents with actual providers before deploying.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
2026-01-11 22:35:49 +00:00
|
|
|
## What's Next
|
2025-12-24 03:11:32 +00:00
|
|
|
|
|
|
|
|
**Ready to write agents?** → [AGENTS.md](AGENTS.md)
|
|
|
|
|
|
|
|
|
|
**Want advanced templates?** → [TEMPLATES.md](TEMPLATES.md)
|
|
|
|
|
|
|
|
|
|
**See practical examples?** → [Examples](../../examples/12-agent-execution/)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
2026-01-11 22:35:49 +00:00
|
|
|
**For technical details:** See [llm-integration.md](llm-integration.md)
|