TypeDialog/docs/agent/llm-providers.md
2026-01-11 22:35:49 +00:00

13 KiB

LLM Providers Guide

Complete guide to using different LLM providers with TypeDialog Agent.

Overview

TypeDialog Agent supports 4 LLM providers, each with unique strengths:

Provider Type Best For Privacy Cost
Claude Cloud Code, reasoning, analysis Cloud $$
OpenAI Cloud Code, general tasks Cloud $$
Gemini Cloud Creative, multi-modal Cloud $ (free tier)
Ollama Local Privacy, offline, free Local Free

Choosing a Provider

Quick Decision Tree

Need privacy or offline? → Ollama

Budget constrained? → Gemini (free tier) or Ollama

Code review/refactoring? → Claude Sonnet or GPT-4o

Creative writing? → Gemini 2.0 Flash

Critical architecture? → Claude Opus

Quick tasks? → Claude Haiku

General purpose? → Claude Sonnet or GPT-4o-mini
```text

## Claude (Anthropic)

### Claude overview

- **Best for**: Code analysis, reasoning, planning
- **Strengths**: Excellent code understanding, strong reasoning, reliable
- **Pricing**: Pay per token
- **Streaming**: ✅ With token usage

### Claude models available

| Model | Use Case | Speed | Cost | Context |
 | ------- | ---------- | ------- | ------ | --------- |
| `claude-3-5-haiku-20241022` | Quick tasks, prototyping | ⚡⚡⚡ | $ | 200K |
| `claude-3-5-sonnet-20241022` | Balanced quality/speed | ⚡⚡ | $$ | 200K |
| `claude-opus-4-5-20251101` | Complex reasoning, critical | ⚡ | $$$ | 200K |

### Claude setup

```bash
# Get API key from: https://console.anthropic.com
export ANTHROPIC_API_KEY=sk-ant-...

# Add to shell profile
echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc
```text

### Claude usage in agents

```yaml
---
@agent {
  role: code reviewer,
  llm: claude-3-5-sonnet-20241022,
  max_tokens: 4096,
  temperature: 0.3
}
---
```text

### Claude best practices

**Use Haiku for:**

- Simple queries
- Quick prototyping
- High-volume tasks
- Tight budgets

**Use Sonnet for:**

- Code review
- Documentation
- Task planning
- General development

**Use Opus for:**

- Architecture design
- Critical security reviews
- Complex problem-solving
- High-stakes decisions

### Claude cost optimization

```yaml
# Limit tokens for short tasks
@agent {
  llm: claude-3-5-haiku-20241022,
  max_tokens: 500  # Cheaper than default 4096
}

# Use lower temperature for factual tasks
@agent {
  temperature: 0.2  # More consistent, potentially cheaper
}
```text

---

## OpenAI (GPT)

### OpenAI overview

- **Best for**: Code generation, broad knowledge
- **Strengths**: Excellent code, well-documented, reliable
- **Pricing**: Pay per token
- **Streaming**: ✅ But NO token usage in stream

### OpenAI models available

| Model | Use Case | Speed | Cost | Context |
 | ------- | ---------- | ------- | ------ | --------- |
| `gpt-4o-mini` | Fast code tasks | ⚡⚡⚡ | $ | 128K |
| `gpt-4o` | General purpose, code | ⚡⚡ | $$ | 128K |
| `o1` | Advanced reasoning | ⚡ | $$$ | 128K |
| `o3` | Complex problems | ⚡ | $$$$ | 128K |

### OpenAI setup

```bash
# Get API key from: https://platform.openai.com
export OPENAI_API_KEY=sk-...

# Add to shell profile
echo 'export OPENAI_API_KEY=sk-...' >> ~/.bashrc
```text

### OpenAI usage in agents

```yaml
---
@agent {
  role: refactoring assistant,
  llm: gpt-4o-mini,
  max_tokens: 2048,
  temperature: 0.2
}
---
```text

### OpenAI best practices

**Use GPT-4o-mini for:**

- Code refactoring
- Quick iterations
- High-volume tasks
- Development/testing

**Use GPT-4o for:**

- Production code generation
- Complex documentation
- Multi-step tasks

**Use o1/o3 for:**

- Mathematical reasoning
- Complex algorithm design
- Research tasks

### Important Limitation

⚠️ **OpenAI does NOT provide token usage in streaming mode**

```yaml
# You'll get streaming text but no token counts during stream
# Token usage only available in blocking mode
```text

If you need token tracking, use Claude or Gemini instead.

---

## Google Gemini

### Gemini overview

- **Best for**: Creative content, multi-modal tasks
- **Strengths**: Creative, free tier, fast
- **Pricing**: Free tier + pay per token
- **Streaming**: ✅ With token usage

### Gemini models available

| Model | Use Case | Speed | Cost | Context |
 | ------- | ---------- | ------- | ------ | --------- |
| `gemini-2.0-flash-exp` | Fast, general purpose | ⚡⚡⚡ | Free tier | 1M |
| `gemini-1.5-flash` | Lightweight tasks | ⚡⚡⚡ | $ | 1M |
| `gemini-1.5-pro` | Complex reasoning | ⚡⚡ | $$ | 2M |
| `gemini-3-pro` | Preview features | ⚡ | $$$ | TBD |

### Gemini setup

```bash
# Get API key from: https://aistudio.google.com/app/apikey
export GEMINI_API_KEY=...

# Or use GOOGLE_API_KEY
export GOOGLE_API_KEY=...
```text

### Gemini usage in agents

```yaml
---
@agent {
  role: creative writer,
  llm: gemini-2.0-flash-exp,
  max_tokens: 4096,
  temperature: 0.9  # High for creativity
}
---
```text

### Gemini best practices

**Use Gemini 2.0 Flash for:**

- Creative writing
- Content generation
- Prototyping (free tier)
- High-volume tasks

**Use Gemini 1.5 Pro for:**

- Long documents (2M context)
- Complex analysis
- Production workloads

### Free Tier

Gemini offers a generous free tier:

- **Rate limits**: 15 RPM, 1M TPM, 1500 RPD
- **Models**: gemini-2.0-flash-exp
- **Perfect for**: Development, testing, low-volume production

### Unique Features

```yaml
# Gemini uses "model" role instead of "assistant"
# TypeDialog handles this automatically

# Huge context windows (1M-2M tokens)
@agent {
  llm: gemini-1.5-pro  # 2M token context!
}
```text

---

## Ollama (Local Models)

### Ollama overview

- **Best for**: Privacy, offline, cost-sensitive
- **Strengths**: Free, private, offline, no API limits
- **Pricing**: Free (compute costs only)
- **Streaming**: ✅ With token usage

### Ollama models available

Download any model from [ollama.ai/library](https://ollama.ai/library):

| Model | Size | Speed | Quality | Best For |
 | ------- | ------ | ------- | --------- | ---------- |
| `llama2` | 7 B | ⚡⚡⚡ | Good | General tasks |
| `llama2:13b` | 13 B | ⚡⚡ | Better | Complex tasks |
| `mistral` | 7 B | ⚡⚡⚡ | Good | Fast inference |
| `codellama` | 7 B | ⚡⚡⚡ | Excellent | Code tasks |
| `mixtral` | 8x7 B | ⚡ | Excellent | Best quality |
| `phi` | 2.7 B | ⚡⚡⚡⚡ | Good | Low resource |
| `qwen` | Various | ⚡⚡ | Excellent | Multilingual |

### Ollama setup

```bash
# 1. Install Ollama
# Download from: https://ollama.ai

# 2. Start server
ollama serve

# 3. Pull a model
ollama pull llama2

# 4. Verify
curl http://localhost:11434/api/tags

# Optional: Custom URL
export OLLAMA_BASE_URL=http://localhost:11434
```text

### Ollama usage in agents

```yaml
---
@agent {
  role: privacy consultant,
  llm: llama2,
  max_tokens: 2048,
  temperature: 0.7
}
---
```text

### Ollama best practices

**Use Ollama when:**

- Working with sensitive data
- Need offline operation
- Want zero API costs
- Developing/testing frequently
- Have privacy requirements

**Model Selection:**

```bash
# Fastest (low resource)
ollama pull phi

# Balanced (recommended)
ollama pull llama2

# Best quality (requires more RAM)
ollama pull mixtral

# Code-specific
ollama pull codellama
```text

### Performance Tips

```yaml
# Use smaller max_tokens for faster responses
@agent {
  llm: llama2,
  max_tokens: 1024  # Faster than 4096
}

# Lower temperature for more deterministic output
@agent {
  temperature: 0.2  # Faster inference
}
```text

### Privacy Advantages

```yaml
---
@agent {
  role: data analyst,
  llm: llama2  # Runs 100% locally
}

@import ".env" as secrets        # Safe - never leaves your machine
@import "users.csv" as user_data # Safe - processed locally
---

Analyze {{user_data}} for GDPR compliance.
Include any secrets from {{secrets}} in analysis.

**This data NEVER leaves your computer!**
```text

### Limitations

- ❌ Quality lower than GPT-4/Claude Opus (but improving!)
- ❌ Requires local compute resources (RAM, CPU/GPU)
- ❌ Slower inference than cloud APIs
- ✅ But: Free, private, offline.

---

## Provider Comparison

### Feature Matrix

| Feature | Claude | OpenAI | Gemini | Ollama |
 | --------- | -------- | -------- | -------- | -------- |
| **Streaming** | ✅ SSE | ✅ SSE | ✅ JSON | ✅ JSON |
| **Usage in stream** | ✅ | ❌ | ✅ | ✅ |
| **Offline** | ❌ | ❌ | ❌ | ✅ |
| **Free tier** | ❌ | ❌ | ✅ | ✅ |
| **Privacy** | Cloud | Cloud | Cloud | Local |
| **Max context** | 200K | 128K | 2M | Varies |
| **Code quality** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| **Creative** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| **Speed** | Fast | Fast | Very Fast | Varies |

### Cost Comparison

**Approximate costs (as of Dec 2024):**

| Provider | Model | Input | Output | 1M tokens |
 | ---------- | ------- | -------- | -------- | ----------- |
| Claude | Haiku | $0.25/M | $1.25/M | ~$1.50 |
| Claude | Sonnet | $3/M | $15/M | ~$18 |
| Claude | Opus | $15/M | $75/M | ~$90 |
| OpenAI | GPT-4o-mini | $0.15/M | $0.60/M | ~$0.75 |
| OpenAI | GPT-4o | $2.50/M | $10/M | ~$12.50 |
| Gemini | 2.0 Flash | Free tier | Free tier | Free |
| Gemini | 1.5 Pro | $1.25/M | $5/M | ~$6.25 |
| **Ollama** | **Any model** | **$0** | **$0** | **$0** |

*Note: Prices change - check provider websites for current rates*

### Quality Comparison

**Code Tasks:**

1. Claude Opus / GPT-4o (tie)
2. Claude Sonnet
3. GPT-4o-mini / CodeLlama (Ollama)
4. Claude Haiku
5. Gemini

**Creative Tasks:**

1. Gemini 1.5 Pro
2. Claude Opus
3. GPT-4o
4. Gemini 2.0 Flash
5. Claude Sonnet

**Reasoning:**

1. Claude Opus
2. GPT-4o / o1
3. Claude Sonnet
4. Gemini 1.5 Pro
5. Mixtral (Ollama)

**Speed:**

1. Gemini 2.0 Flash
2. Claude Haiku
3. GPT-4o-mini
4. Ollama (depends on hardware)
5. Claude Opus

---

## Multi-Provider Strategies

### Development → Production

```yaml
# Development: Use Ollama (free, fast iteration)
@agent {
  llm: llama2
}

# Testing: Use Gemini free tier
@agent {
  llm: gemini-2.0-flash-exp
}

# Production: Use Claude Sonnet
@agent {
  llm: claude-3-5-sonnet-20241022
}
```text

### Privacy Tiers

```yaml
# Public data: Any cloud provider
@agent {
  llm: gemini-2.0-flash-exp
}

# Sensitive data: Use Ollama
@agent {
  llm: llama2  # Stays on your machine
}
```text

### Ollama cost optimization

```yaml
# High volume, simple: Haiku or Gemini
@agent {
  llm: claude-3-5-haiku-20241022
}

# Medium volume, quality: Sonnet or GPT-4o-mini
@agent {
  llm: claude-3-5-sonnet-20241022
}

# Critical, low volume: Opus
@agent {
  llm: claude-opus-4-5-20251101
}
```text

---

## Testing Providers

### Compare All Providers

```bash
# Run provider comparison demo
cargo run --example provider_comparison

# Or use demo script
./demos/agent/run_demo.sh
```text

### Test Individual Provider

```bash
# Claude
typedialog-ag demos/agent/demo-claude.agent.mdx

# OpenAI
typedialog-ag demos/agent/demo-openai.agent.mdx

# Gemini
typedialog-ag demos/agent/demo-gemini.agent.mdx

# Ollama (requires ollama serve)
typedialog-ag demos/agent/demo-ollama.agent.mdx
```text

---

## Troubleshooting

### API Key Not Working

```bash
# Verify key is set
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
echo $GEMINI_API_KEY

# Test with curl
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model":"claude-3-5-haiku-20241022","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}'
```text

### Quota Exceeded

**Error:** `429 Too Many Requests`

**Solutions:**

1. Wait for quota reset
2. Upgrade API plan
3. Switch to different provider
4. Use Ollama (no quotas)

### Ollama Connection Failed

**Error:** `Failed to call Ollama API`

**Solutions:**

```bash
# Check server running
curl http://localhost:11434/api/tags

# Start server
ollama serve

# Check custom URL
echo $OLLAMA_BASE_URL
```text

### Model Not Found

**Error:** `Unknown model provider`

**Solution:**

```yaml
# Check model name spelling
llm: claude-3-5-haiku-20241022  # Correct
llm: claude-haiku                # Wrong

# For Ollama, pull model first
# ollama pull llama2
```text

---

## Best Practices

### 1. Match Provider to Task

Don't use Opus for simple greetings. Don't use Ollama for critical architecture.

### 2. Use Free Tiers for Development

Develop with Ollama or Gemini free tier, deploy with paid providers.

### 3. Monitor Costs

```bash
# Track token usage
# All providers (except OpenAI streaming) report usage
```text

### 4. Respect Privacy

Use Ollama for sensitive data, cloud providers for public data.

### 5. Test Before Production

Always test agents with actual providers before deploying.

---

## What's Next

**Ready to write agents?** → [AGENTS.md](AGENTS.md)

**Want advanced templates?** → [TEMPLATES.md](TEMPLATES.md)

**See practical examples?** → [Examples](../../examples/12-agent-execution/)

---

**For technical details:** See [llm-integration.md](llm-integration.md)