TypeDialog/docs/agent/llm-providers.md

# LLM Providers Guide

Complete guide to using different LLM providers with TypeDialog Agent.

## Overview

TypeDialog Agent supports 4 LLM providers, each with unique strengths:

| Provider | Type | Best For | Privacy | Cost |
 | ---------- | ------ | ---------- | --------- | ------ |
| **Claude** | Cloud | Code, reasoning, analysis | ❌ Cloud | $$ |
| **OpenAI** | Cloud | Code, general tasks | ❌ Cloud | $$ |
| **Gemini** | Cloud | Creative, multi-modal | ❌ Cloud | $ (free tier) |
| **Ollama** | Local | Privacy, offline, free | ✅ Local | Free |

## Choosing a Provider

### Quick Decision Tree

```text
Need privacy or offline? → Ollama

Budget constrained? → Gemini (free tier) or Ollama

Code review/refactoring? → Claude Sonnet or GPT-4o

Creative writing? → Gemini 2.0 Flash

Critical architecture? → Claude Opus

Quick tasks? → Claude Haiku

General purpose? → Claude Sonnet or GPT-4o-mini
```text

## Claude (Anthropic)

### Claude overview

- **Best for**: Code analysis, reasoning, planning
- **Strengths**: Excellent code understanding, strong reasoning, reliable
- **Pricing**: Pay per token
- **Streaming**: ✅ With token usage

### Claude models available

| Model | Use Case | Speed | Cost | Context |
 | ------- | ---------- | ------- | ------ | --------- |
| `claude-3-5-haiku-20241022` | Quick tasks, prototyping | ⚡⚡⚡ | $ | 200K |
| `claude-3-5-sonnet-20241022` | Balanced quality/speed | ⚡⚡ | $$ | 200K |
| `claude-opus-4-5-20251101` | Complex reasoning, critical | ⚡ | $$$ | 200K |

### Claude setup

```bash
# Get API key from: https://console.anthropic.com
export ANTHROPIC_API_KEY=sk-ant-...

# Add to shell profile
echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc
```text

### Claude usage in agents

```yaml
---
@agent {
  role: code reviewer,
  llm: claude-3-5-sonnet-20241022,
  max_tokens: 4096,
  temperature: 0.3
}
---
```text

### Claude best practices

**Use Haiku for:**

- Simple queries
- Quick prototyping
- High-volume tasks
- Tight budgets

**Use Sonnet for:**

- Code review
- Documentation
- Task planning
- General development

**Use Opus for:**

- Architecture design
- Critical security reviews
- Complex problem-solving
- High-stakes decisions

### Claude cost optimization

```yaml
# Limit tokens for short tasks
@agent {
  llm: claude-3-5-haiku-20241022,
  max_tokens: 500  # Cheaper than default 4096
}

# Use lower temperature for factual tasks
@agent {
  temperature: 0.2  # More consistent, potentially cheaper
}
```text

---

## OpenAI (GPT)

### OpenAI overview

- **Best for**: Code generation, broad knowledge
- **Strengths**: Excellent code, well-documented, reliable
- **Pricing**: Pay per token
- **Streaming**: ✅ But NO token usage in stream

### OpenAI models available

| Model | Use Case | Speed | Cost | Context |
 | ------- | ---------- | ------- | ------ | --------- |
| `gpt-4o-mini` | Fast code tasks | ⚡⚡⚡ | $ | 128K |
| `gpt-4o` | General purpose, code | ⚡⚡ | $$ | 128K |
| `o1` | Advanced reasoning | ⚡ | $$$ | 128K |
| `o3` | Complex problems | ⚡ | $$$$ | 128K |

### OpenAI setup

```bash
# Get API key from: https://platform.openai.com
export OPENAI_API_KEY=sk-...

# Add to shell profile
echo 'export OPENAI_API_KEY=sk-...' >> ~/.bashrc
```text

### OpenAI usage in agents

```yaml
---
@agent {
  role: refactoring assistant,
  llm: gpt-4o-mini,
  max_tokens: 2048,
  temperature: 0.2
}
---
```text

### OpenAI best practices

**Use GPT-4o-mini for:**

- Code refactoring
- Quick iterations
- High-volume tasks
- Development/testing

**Use GPT-4o for:**

- Production code generation
- Complex documentation
- Multi-step tasks

**Use o1/o3 for:**

- Mathematical reasoning
- Complex algorithm design
- Research tasks

### Important Limitation

⚠️ **OpenAI does NOT provide token usage in streaming mode**

```yaml
# You'll get streaming text but no token counts during stream
# Token usage only available in blocking mode
```text

If you need token tracking, use Claude or Gemini instead.

---

## Google Gemini

### Gemini overview

- **Best for**: Creative content, multi-modal tasks
- **Strengths**: Creative, free tier, fast
- **Pricing**: Free tier + pay per token
- **Streaming**: ✅ With token usage

### Gemini models available

| Model | Use Case | Speed | Cost | Context |
 | ------- | ---------- | ------- | ------ | --------- |
| `gemini-2.0-flash-exp` | Fast, general purpose | ⚡⚡⚡ | Free tier | 1M |
| `gemini-1.5-flash` | Lightweight tasks | ⚡⚡⚡ | $ | 1M |
| `gemini-1.5-pro` | Complex reasoning | ⚡⚡ | $$ | 2M |
| `gemini-3-pro` | Preview features | ⚡ | $$$ | TBD |

### Gemini setup

```bash
# Get API key from: https://aistudio.google.com/app/apikey
export GEMINI_API_KEY=...

# Or use GOOGLE_API_KEY
export GOOGLE_API_KEY=...
```text

### Gemini usage in agents

```yaml
---
@agent {
  role: creative writer,
  llm: gemini-2.0-flash-exp,
  max_tokens: 4096,
  temperature: 0.9  # High for creativity
}
---
```text

### Gemini best practices

**Use Gemini 2.0 Flash for:**

- Creative writing
- Content generation
- Prototyping (free tier)
- High-volume tasks

**Use Gemini 1.5 Pro for:**

- Long documents (2M context)
- Complex analysis
- Production workloads

### Free Tier

Gemini offers a generous free tier:

- **Rate limits**: 15 RPM, 1M TPM, 1500 RPD
- **Models**: gemini-2.0-flash-exp
- **Perfect for**: Development, testing, low-volume production

### Unique Features

```yaml
# Gemini uses "model" role instead of "assistant"
# TypeDialog handles this automatically

# Huge context windows (1M-2M tokens)
@agent {
  llm: gemini-1.5-pro  # 2M token context!
}
```text

---

## Ollama (Local Models)

### Ollama overview

- **Best for**: Privacy, offline, cost-sensitive
- **Strengths**: Free, private, offline, no API limits
- **Pricing**: Free (compute costs only)
- **Streaming**: ✅ With token usage

### Ollama models available

Download any model from [ollama.ai/library](https://ollama.ai/library):

| Model | Size | Speed | Quality | Best For |
 | ------- | ------ | ------- | --------- | ---------- |
| `llama2` | 7 B | ⚡⚡⚡ | Good | General tasks |
| `llama2:13b` | 13 B | ⚡⚡ | Better | Complex tasks |
| `mistral` | 7 B | ⚡⚡⚡ | Good | Fast inference |
| `codellama` | 7 B | ⚡⚡⚡ | Excellent | Code tasks |
| `mixtral` | 8x7 B | ⚡ | Excellent | Best quality |
| `phi` | 2.7 B | ⚡⚡⚡⚡ | Good | Low resource |
| `qwen` | Various | ⚡⚡ | Excellent | Multilingual |

### Ollama setup

```bash
# 1. Install Ollama
# Download from: https://ollama.ai

# 2. Start server
ollama serve

# 3. Pull a model
ollama pull llama2

# 4. Verify
curl http://localhost:11434/api/tags

# Optional: Custom URL
export OLLAMA_BASE_URL=http://localhost:11434
```text

### Ollama usage in agents

```yaml
---
@agent {
  role: privacy consultant,
  llm: llama2,
  max_tokens: 2048,
  temperature: 0.7
}
---
```text

### Ollama best practices

**Use Ollama when:**

- Working with sensitive data
- Need offline operation
- Want zero API costs
- Developing/testing frequently
- Have privacy requirements

**Model Selection:**

```bash
# Fastest (low resource)
ollama pull phi

# Balanced (recommended)
ollama pull llama2

# Best quality (requires more RAM)
ollama pull mixtral

# Code-specific
ollama pull codellama
```text

### Performance Tips

```yaml
# Use smaller max_tokens for faster responses
@agent {
  llm: llama2,
  max_tokens: 1024  # Faster than 4096
}

# Lower temperature for more deterministic output
@agent {
  temperature: 0.2  # Faster inference
}
```text

### Privacy Advantages

```yaml
---
@agent {
  role: data analyst,
  llm: llama2  # Runs 100% locally
}

@import ".env" as secrets        # Safe - never leaves your machine
@import "users.csv" as user_data # Safe - processed locally
---

Analyze {{user_data}} for GDPR compliance.
Include any secrets from {{secrets}} in analysis.

**This data NEVER leaves your computer!**
```text

### Limitations

- ❌ Quality lower than GPT-4/Claude Opus (but improving!)
- ❌ Requires local compute resources (RAM, CPU/GPU)
- ❌ Slower inference than cloud APIs
- ✅ But: Free, private, offline.

---

## Provider Comparison

### Feature Matrix

| Feature | Claude | OpenAI | Gemini | Ollama |
 | --------- | -------- | -------- | -------- | -------- |
| **Streaming** | ✅ SSE | ✅ SSE | ✅ JSON | ✅ JSON |
| **Usage in stream** | ✅ | ❌ | ✅ | ✅ |
| **Offline** | ❌ | ❌ | ❌ | ✅ |
| **Free tier** | ❌ | ❌ | ✅ | ✅ |
| **Privacy** | Cloud | Cloud | Cloud | Local |
| **Max context** | 200K | 128K | 2M | Varies |
| **Code quality** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| **Creative** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| **Speed** | Fast | Fast | Very Fast | Varies |

### Cost Comparison

**Approximate costs (as of Dec 2024):**

| Provider | Model | Input | Output | 1M tokens |
 | ---------- | ------- | -------- | -------- | ----------- |
| Claude | Haiku | $0.25/M | $1.25/M | ~$1.50 |
| Claude | Sonnet | $3/M | $15/M | ~$18 |
| Claude | Opus | $15/M | $75/M | ~$90 |
| OpenAI | GPT-4o-mini | $0.15/M | $0.60/M | ~$0.75 |
| OpenAI | GPT-4o | $2.50/M | $10/M | ~$12.50 |
| Gemini | 2.0 Flash | Free tier | Free tier | Free |
| Gemini | 1.5 Pro | $1.25/M | $5/M | ~$6.25 |
| **Ollama** | **Any model** | **$0** | **$0** | **$0** |

*Note: Prices change - check provider websites for current rates*

### Quality Comparison

**Code Tasks:**

1. Claude Opus / GPT-4o (tie)
2. Claude Sonnet
3. GPT-4o-mini / CodeLlama (Ollama)
4. Claude Haiku
5. Gemini

**Creative Tasks:**

1. Gemini 1.5 Pro
2. Claude Opus
3. GPT-4o
4. Gemini 2.0 Flash
5. Claude Sonnet

**Reasoning:**

1. Claude Opus
2. GPT-4o / o1
3. Claude Sonnet
4. Gemini 1.5 Pro
5. Mixtral (Ollama)

**Speed:**

1. Gemini 2.0 Flash
2. Claude Haiku
3. GPT-4o-mini
4. Ollama (depends on hardware)
5. Claude Opus

---

## Multi-Provider Strategies

### Development → Production

```yaml
# Development: Use Ollama (free, fast iteration)
@agent {
  llm: llama2
}

# Testing: Use Gemini free tier
@agent {
  llm: gemini-2.0-flash-exp
}

# Production: Use Claude Sonnet
@agent {
  llm: claude-3-5-sonnet-20241022
}
```text

### Privacy Tiers

```yaml
# Public data: Any cloud provider
@agent {
  llm: gemini-2.0-flash-exp
}

# Sensitive data: Use Ollama
@agent {
  llm: llama2  # Stays on your machine
}
```text

### Ollama cost optimization

```yaml
# High volume, simple: Haiku or Gemini
@agent {
  llm: claude-3-5-haiku-20241022
}

# Medium volume, quality: Sonnet or GPT-4o-mini
@agent {
  llm: claude-3-5-sonnet-20241022
}

# Critical, low volume: Opus
@agent {
  llm: claude-opus-4-5-20251101
}
```text

---

## Testing Providers

### Compare All Providers

```bash
# Run provider comparison demo
cargo run --example provider_comparison

# Or use demo script
./demos/agent/run_demo.sh
```text

### Test Individual Provider

```bash
# Claude
typedialog-ag demos/agent/demo-claude.agent.mdx

# OpenAI
typedialog-ag demos/agent/demo-openai.agent.mdx

# Gemini
typedialog-ag demos/agent/demo-gemini.agent.mdx

# Ollama (requires ollama serve)
typedialog-ag demos/agent/demo-ollama.agent.mdx
```text

---

## Troubleshooting

### API Key Not Working

```bash
# Verify key is set
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
echo $GEMINI_API_KEY

# Test with curl
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model":"claude-3-5-haiku-20241022","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}'
```text

### Quota Exceeded

**Error:** `429 Too Many Requests`

**Solutions:**

1. Wait for quota reset
2. Upgrade API plan
3. Switch to different provider
4. Use Ollama (no quotas)

### Ollama Connection Failed

**Error:** `Failed to call Ollama API`

**Solutions:**

```bash
# Check server running
curl http://localhost:11434/api/tags

# Start server
ollama serve

# Check custom URL
echo $OLLAMA_BASE_URL
```text

### Model Not Found

**Error:** `Unknown model provider`

**Solution:**

```yaml
# Check model name spelling
llm: claude-3-5-haiku-20241022  # Correct
llm: claude-haiku                # Wrong

# For Ollama, pull model first
# ollama pull llama2
```text

---

## Best Practices

### 1. Match Provider to Task

Don't use Opus for simple greetings. Don't use Ollama for critical architecture.

### 2. Use Free Tiers for Development

Develop with Ollama or Gemini free tier, deploy with paid providers.

### 3. Monitor Costs

```bash
# Track token usage
# All providers (except OpenAI streaming) report usage
```text

### 4. Respect Privacy

Use Ollama for sensitive data, cloud providers for public data.

### 5. Test Before Production

Always test agents with actual providers before deploying.

---

## What's Next

**Ready to write agents?** → [AGENTS.md](AGENTS.md)

**Want advanced templates?** → [TEMPLATES.md](TEMPLATES.md)

**See practical examples?** → [Examples](../../examples/12-agent-execution/)

---

**For technical details:** See [llm-integration.md](llm-integration.md)
chore: new projet folders structure 2025-12-24 03:11:32 +00:00			`# LLM Providers Guide`

			`Complete guide to using different LLM providers with TypeDialog Agent.`

			`## Overview`

			`TypeDialog Agent supports 4 LLM providers, each with unique strengths:`

			`\| Provider \| Type \| Best For \| Privacy \| Cost \|`
			`\| ---------- \| ------ \| ---------- \| --------- \| ------ \|`
			`\| Claude \| Cloud \| Code, reasoning, analysis \| ❌ Cloud \| $$ \|`
			`\| OpenAI \| Cloud \| Code, general tasks \| ❌ Cloud \| $$ \|`
			`\| Gemini \| Cloud \| Creative, multi-modal \| ❌ Cloud \| $ (free tier) \|`
			`\| Ollama \| Local \| Privacy, offline, free \| ✅ Local \| Free \|`

			`## Choosing a Provider`

			`### Quick Decision Tree`

			```text
			`Need privacy or offline? → Ollama`

			`Budget constrained? → Gemini (free tier) or Ollama`

			`Code review/refactoring? → Claude Sonnet or GPT-4o`

			`Creative writing? → Gemini 2.0 Flash`

			`Critical architecture? → Claude Opus`

			`Quick tasks? → Claude Haiku`

			`General purpose? → Claude Sonnet or GPT-4o-mini`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`## Claude (Anthropic)`

			`### Claude overview`

			`- Best for: Code analysis, reasoning, planning`
			`- Strengths: Excellent code understanding, strong reasoning, reliable`
			`- Pricing: Pay per token`
			`- Streaming: ✅ With token usage`

			`### Claude models available`

			`\| Model \| Use Case \| Speed \| Cost \| Context \|`
			`\| ------- \| ---------- \| ------- \| ------ \| --------- \|`
			\| `claude-3-5-haiku-20241022` \| Quick tasks, prototyping \| ⚡⚡⚡ \| $ \| 200K \|
			\| `claude-3-5-sonnet-20241022` \| Balanced quality/speed \| ⚡⚡ \| $$ \| 200K \|
			\| `claude-opus-4-5-20251101` \| Complex reasoning, critical \| ⚡ \| $$$ \| 200K \|

			`### Claude setup`

			```bash
			`# Get API key from: https://console.anthropic.com`
			`export ANTHROPIC_API_KEY=sk-ant-...`

			`# Add to shell profile`
			`echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Claude usage in agents`

			```yaml
			`---`
			`@agent {`
			`role: code reviewer,`
			`llm: claude-3-5-sonnet-20241022,`
			`max_tokens: 4096,`
			`temperature: 0.3`
			`}`
			`---`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Claude best practices`

			`Use Haiku for:`

			`- Simple queries`
			`- Quick prototyping`
			`- High-volume tasks`
			`- Tight budgets`

			`Use Sonnet for:`

			`- Code review`
			`- Documentation`
			`- Task planning`
			`- General development`

			`Use Opus for:`

			`- Architecture design`
			`- Critical security reviews`
			`- Complex problem-solving`
			`- High-stakes decisions`

			`### Claude cost optimization`

			```yaml
			`# Limit tokens for short tasks`
			`@agent {`
			`llm: claude-3-5-haiku-20241022,`
			`max_tokens: 500 # Cheaper than default 4096`
			`}`

			`# Use lower temperature for factual tasks`
			`@agent {`
			`temperature: 0.2 # More consistent, potentially cheaper`
			`}`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`---`

			`## OpenAI (GPT)`

			`### OpenAI overview`

			`- Best for: Code generation, broad knowledge`
			`- Strengths: Excellent code, well-documented, reliable`
			`- Pricing: Pay per token`
			`- Streaming: ✅ But NO token usage in stream`

			`### OpenAI models available`

			`\| Model \| Use Case \| Speed \| Cost \| Context \|`
			`\| ------- \| ---------- \| ------- \| ------ \| --------- \|`
			\| `gpt-4o-mini` \| Fast code tasks \| ⚡⚡⚡ \| $ \| 128K \|
			\| `gpt-4o` \| General purpose, code \| ⚡⚡ \| $$ \| 128K \|
			\| `o1` \| Advanced reasoning \| ⚡ \| $$$ \| 128K \|
			\| `o3` \| Complex problems \| ⚡ \| $$$$ \| 128K \|

			`### OpenAI setup`

			```bash
			`# Get API key from: https://platform.openai.com`
			`export OPENAI_API_KEY=sk-...`

			`# Add to shell profile`
			`echo 'export OPENAI_API_KEY=sk-...' >> ~/.bashrc`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### OpenAI usage in agents`

			```yaml
			`---`
			`@agent {`
			`role: refactoring assistant,`
			`llm: gpt-4o-mini,`
			`max_tokens: 2048,`
			`temperature: 0.2`
			`}`
			`---`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### OpenAI best practices`

			`Use GPT-4o-mini for:`

			`- Code refactoring`
			`- Quick iterations`
			`- High-volume tasks`
			`- Development/testing`

			`Use GPT-4o for:`

			`- Production code generation`
			`- Complex documentation`
			`- Multi-step tasks`

			`Use o1/o3 for:`

			`- Mathematical reasoning`
			`- Complex algorithm design`
			`- Research tasks`

			`### Important Limitation`

			`⚠️ OpenAI does NOT provide token usage in streaming mode`

			```yaml
			`# You'll get streaming text but no token counts during stream`
			`# Token usage only available in blocking mode`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`If you need token tracking, use Claude or Gemini instead.`

			`---`

			`## Google Gemini`

			`### Gemini overview`

			`- Best for: Creative content, multi-modal tasks`
			`- Strengths: Creative, free tier, fast`
			`- Pricing: Free tier + pay per token`
			`- Streaming: ✅ With token usage`

			`### Gemini models available`

			`\| Model \| Use Case \| Speed \| Cost \| Context \|`
			`\| ------- \| ---------- \| ------- \| ------ \| --------- \|`
			\| `gemini-2.0-flash-exp` \| Fast, general purpose \| ⚡⚡⚡ \| Free tier \| 1M \|
			\| `gemini-1.5-flash` \| Lightweight tasks \| ⚡⚡⚡ \| $ \| 1M \|
			\| `gemini-1.5-pro` \| Complex reasoning \| ⚡⚡ \| $$ \| 2M \|
			\| `gemini-3-pro` \| Preview features \| ⚡ \| $$$ \| TBD \|

			`### Gemini setup`

			```bash
			`# Get API key from: https://aistudio.google.com/app/apikey`
			`export GEMINI_API_KEY=...`

			`# Or use GOOGLE_API_KEY`
			`export GOOGLE_API_KEY=...`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Gemini usage in agents`

			```yaml
			`---`
			`@agent {`
			`role: creative writer,`
			`llm: gemini-2.0-flash-exp,`
			`max_tokens: 4096,`
			`temperature: 0.9 # High for creativity`
			`}`
			`---`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Gemini best practices`

			`Use Gemini 2.0 Flash for:`

			`- Creative writing`
			`- Content generation`
			`- Prototyping (free tier)`
			`- High-volume tasks`

			`Use Gemini 1.5 Pro for:`

			`- Long documents (2M context)`
			`- Complex analysis`
			`- Production workloads`

			`### Free Tier`

			`Gemini offers a generous free tier:`

			`- Rate limits: 15 RPM, 1M TPM, 1500 RPD`
			`- Models: gemini-2.0-flash-exp`
			`- Perfect for: Development, testing, low-volume production`

			`### Unique Features`

			```yaml
			`# Gemini uses "model" role instead of "assistant"`
			`# TypeDialog handles this automatically`

			`# Huge context windows (1M-2M tokens)`
			`@agent {`
			`llm: gemini-1.5-pro # 2M token context!`
			`}`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`---`

			`## Ollama (Local Models)`

			`### Ollama overview`

			`- Best for: Privacy, offline, cost-sensitive`
			`- Strengths: Free, private, offline, no API limits`
			`- Pricing: Free (compute costs only)`
			`- Streaming: ✅ With token usage`

			`### Ollama models available`

			`Download any model from [ollama.ai/library](https://ollama.ai/library):`

			`\| Model \| Size \| Speed \| Quality \| Best For \|`
			`\| ------- \| ------ \| ------- \| --------- \| ---------- \|`
			\| `llama2` \| 7 B \| ⚡⚡⚡ \| Good \| General tasks \|
			\| `llama2:13b` \| 13 B \| ⚡⚡ \| Better \| Complex tasks \|
			\| `mistral` \| 7 B \| ⚡⚡⚡ \| Good \| Fast inference \|
			\| `codellama` \| 7 B \| ⚡⚡⚡ \| Excellent \| Code tasks \|
			\| `mixtral` \| 8x7 B \| ⚡ \| Excellent \| Best quality \|
			\| `phi` \| 2.7 B \| ⚡⚡⚡⚡ \| Good \| Low resource \|
			\| `qwen` \| Various \| ⚡⚡ \| Excellent \| Multilingual \|

			`### Ollama setup`

			```bash
			`# 1. Install Ollama`
			`# Download from: https://ollama.ai`

			`# 2. Start server`
			`ollama serve`

			`# 3. Pull a model`
			`ollama pull llama2`

			`# 4. Verify`
			`curl http://localhost:11434/api/tags`

			`# Optional: Custom URL`
			`export OLLAMA_BASE_URL=http://localhost:11434`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Ollama usage in agents`

			```yaml
			`---`
			`@agent {`
			`role: privacy consultant,`
			`llm: llama2,`
			`max_tokens: 2048,`
			`temperature: 0.7`
			`}`
			`---`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Ollama best practices`

			`Use Ollama when:`

			`- Working with sensitive data`
			`- Need offline operation`
			`- Want zero API costs`
			`- Developing/testing frequently`
			`- Have privacy requirements`

			`Model Selection:`

			```bash
			`# Fastest (low resource)`
			`ollama pull phi`

			`# Balanced (recommended)`
			`ollama pull llama2`

			`# Best quality (requires more RAM)`
			`ollama pull mixtral`

			`# Code-specific`
			`ollama pull codellama`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Performance Tips`

			```yaml
			`# Use smaller max_tokens for faster responses`
			`@agent {`
			`llm: llama2,`
			`max_tokens: 1024 # Faster than 4096`
			`}`

			`# Lower temperature for more deterministic output`
			`@agent {`
			`temperature: 0.2 # Faster inference`
			`}`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Privacy Advantages`

			```yaml
			`---`
			`@agent {`
			`role: data analyst,`
			`llm: llama2 # Runs 100% locally`
			`}`

			`@import ".env" as secrets # Safe - never leaves your machine`
			`@import "users.csv" as user_data # Safe - processed locally`
			`---`

			`Analyze {{user_data}} for GDPR compliance.`
			`Include any secrets from {{secrets}} in analysis.`

			`This data NEVER leaves your computer!`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Limitations`

			`- ❌ Quality lower than GPT-4/Claude Opus (but improving!)`
			`- ❌ Requires local compute resources (RAM, CPU/GPU)`
			`- ❌ Slower inference than cloud APIs`
			`- ✅ But: Free, private, offline.`

			`---`

			`## Provider Comparison`

			`### Feature Matrix`

			`\| Feature \| Claude \| OpenAI \| Gemini \| Ollama \|`
			`\| --------- \| -------- \| -------- \| -------- \| -------- \|`
			`\| Streaming \| ✅ SSE \| ✅ SSE \| ✅ JSON \| ✅ JSON \|`
			`\| Usage in stream \| ✅ \| ❌ \| ✅ \| ✅ \|`
			`\| Offline \| ❌ \| ❌ \| ❌ \| ✅ \|`
			`\| Free tier \| ❌ \| ❌ \| ✅ \| ✅ \|`
			`\| Privacy \| Cloud \| Cloud \| Cloud \| Local \|`
			`\| Max context \| 200K \| 128K \| 2M \| Varies \|`
			`\| Code quality \| ⭐⭐⭐⭐⭐ \| ⭐⭐⭐⭐⭐ \| ⭐⭐⭐⭐ \| ⭐⭐⭐ \|`
			`\| Creative \| ⭐⭐⭐⭐ \| ⭐⭐⭐⭐ \| ⭐⭐⭐⭐⭐ \| ⭐⭐⭐ \|`
			`\| Speed \| Fast \| Fast \| Very Fast \| Varies \|`

			`### Cost Comparison`

			`Approximate costs (as of Dec 2024):`

			`\| Provider \| Model \| Input \| Output \| 1M tokens \|`
			`\| ---------- \| ------- \| -------- \| -------- \| ----------- \|`
			`\| Claude \| Haiku \| $0.25/M \| $1.25/M \| ~$1.50 \|`
			`\| Claude \| Sonnet \| $3/M \| $15/M \| ~$18 \|`
			`\| Claude \| Opus \| $15/M \| $75/M \| ~$90 \|`
			`\| OpenAI \| GPT-4o-mini \| $0.15/M \| $0.60/M \| ~$0.75 \|`
			`\| OpenAI \| GPT-4o \| $2.50/M \| $10/M \| ~$12.50 \|`
			`\| Gemini \| 2.0 Flash \| Free tier \| Free tier \| Free \|`
			`\| Gemini \| 1.5 Pro \| $1.25/M \| $5/M \| ~$6.25 \|`
			`\| Ollama \| Any model \| $0 \| $0 \| $0 \|`

			`Note: Prices change - check provider websites for current rates`

			`### Quality Comparison`

			`Code Tasks:`

			`1. Claude Opus / GPT-4o (tie)`
			`2. Claude Sonnet`
			`3. GPT-4o-mini / CodeLlama (Ollama)`
			`4. Claude Haiku`
			`5. Gemini`

			`Creative Tasks:`

			`1. Gemini 1.5 Pro`
			`2. Claude Opus`
			`3. GPT-4o`
			`4. Gemini 2.0 Flash`
			`5. Claude Sonnet`

			`Reasoning:`

			`1. Claude Opus`
			`2. GPT-4o / o1`
			`3. Claude Sonnet`
			`4. Gemini 1.5 Pro`
			`5. Mixtral (Ollama)`

			`Speed:`

			`1. Gemini 2.0 Flash`
			`2. Claude Haiku`
			`3. GPT-4o-mini`
			`4. Ollama (depends on hardware)`
			`5. Claude Opus`

			`---`

			`## Multi-Provider Strategies`

			`### Development → Production`

			```yaml
			`# Development: Use Ollama (free, fast iteration)`
			`@agent {`
			`llm: llama2`
			`}`

			`# Testing: Use Gemini free tier`
			`@agent {`
			`llm: gemini-2.0-flash-exp`
			`}`

			`# Production: Use Claude Sonnet`
			`@agent {`
			`llm: claude-3-5-sonnet-20241022`
			`}`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Privacy Tiers`

			```yaml
			`# Public data: Any cloud provider`
			`@agent {`
			`llm: gemini-2.0-flash-exp`
			`}`

			`# Sensitive data: Use Ollama`
			`@agent {`
			`llm: llama2 # Stays on your machine`
			`}`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Ollama cost optimization`

			```yaml
			`# High volume, simple: Haiku or Gemini`
			`@agent {`
			`llm: claude-3-5-haiku-20241022`
			`}`

			`# Medium volume, quality: Sonnet or GPT-4o-mini`
			`@agent {`
			`llm: claude-3-5-sonnet-20241022`
			`}`

			`# Critical, low volume: Opus`
			`@agent {`
			`llm: claude-opus-4-5-20251101`
			`}`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`---`

			`## Testing Providers`

			`### Compare All Providers`

			```bash
			`# Run provider comparison demo`
			`cargo run --example provider_comparison`

			`# Or use demo script`
			`./demos/agent/run_demo.sh`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Test Individual Provider`

			```bash
			`# Claude`
			`typedialog-ag demos/agent/demo-claude.agent.mdx`

			`# OpenAI`
			`typedialog-ag demos/agent/demo-openai.agent.mdx`

			`# Gemini`
			`typedialog-ag demos/agent/demo-gemini.agent.mdx`

			`# Ollama (requires ollama serve)`
			`typedialog-ag demos/agent/demo-ollama.agent.mdx`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`---`

			`## Troubleshooting`

			`### API Key Not Working`

			```bash
			`# Verify key is set`
			`echo $ANTHROPIC_API_KEY`
			`echo $OPENAI_API_KEY`
			`echo $GEMINI_API_KEY`

			`# Test with curl`
			`curl https://api.anthropic.com/v1/messages \`
			`-H "x-api-key: $ANTHROPIC_API_KEY" \`
			`-H "content-type: application/json" \`
			`-d '{"model":"claude-3-5-haiku-20241022","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}'`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Quota Exceeded`

			Error: `429 Too Many Requests`

			`Solutions:`

			`1. Wait for quota reset`
			`2. Upgrade API plan`
			`3. Switch to different provider`
			`4. Use Ollama (no quotas)`

			`### Ollama Connection Failed`

			Error: `Failed to call Ollama API`

			`Solutions:`

			```bash
			`# Check server running`
			`curl http://localhost:11434/api/tags`

			`# Start server`
			`ollama serve`

			`# Check custom URL`
			`echo $OLLAMA_BASE_URL`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### Model Not Found`

			Error: `Unknown model provider`

			`Solution:`

			```yaml
			`# Check model name spelling`
			`llm: claude-3-5-haiku-20241022 # Correct`
			`llm: claude-haiku # Wrong`

			`# For Ollama, pull model first`
			`# ollama pull llama2`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`---`

			`## Best Practices`

			`### 1. Match Provider to Task`

			`Don't use Opus for simple greetings. Don't use Ollama for critical architecture.`

			`### 2. Use Free Tiers for Development`

			`Develop with Ollama or Gemini free tier, deploy with paid providers.`

			`### 3. Monitor Costs`

			```bash
			`# Track token usage`
			`# All providers (except OpenAI streaming) report usage`
chore: fix md lint 2026-01-11 22:35:49 +00:00			```text
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`### 4. Respect Privacy`

			`Use Ollama for sensitive data, cloud providers for public data.`

			`### 5. Test Before Production`

			`Always test agents with actual providers before deploying.`

			`---`

chore: fix md lint 2026-01-11 22:35:49 +00:00			`## What's Next`
chore: new projet folders structure 2025-12-24 03:11:32 +00:00
			`Ready to write agents? → [AGENTS.md](AGENTS.md)`

			`Want advanced templates? → [TEMPLATES.md](TEMPLATES.md)`

			`See practical examples? → [Examples](../../examples/12-agent-execution/)`

			`---`

chore: fix md lint 2026-01-11 22:35:49 +00:00			`For technical details: See [llm-integration.md](llm-integration.md)`