TypeDialog/docs/agent/llm-integration.md

# LLM Integration

TypeAgent Core now includes full LLM execution capabilities, allowing agents to call real language models.

## Supported Providers

### Claude (Anthropic)

- ✅ Fully supported with streaming
- Models: `claude-3-5-haiku-20241022`, `claude-3-5-sonnet-20241022`, `claude-opus-4`, etc.
- Requires: `ANTHROPIC_API_KEY` environment variable
- Features: Full SSE streaming, token usage tracking

### OpenAI

- ✅ Fully supported with streaming
- Models: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `o1`, `o3`, `o4-mini`, etc.
- Requires: `OPENAI_API_KEY` environment variable
- Features: Full SSE streaming, token usage tracking

### Google Gemini

- ✅ Fully supported with streaming
- Models: `gemini-2.0-flash-exp`, `gemini-1.5-pro`, `gemini-1.5-flash`, etc.
- Requires: `GEMINI_API_KEY` or `GOOGLE_API_KEY` environment variable
- Features: Full JSON streaming, token usage tracking
- Note: Assistant role is mapped to "model" in Gemini API

### Ollama (Local Models)

- ✅ Fully supported with streaming
- Models: `llama2`, `mistral`, `phi`, `codellama`, `mixtral`, `qwen`, etc.
- Requires: Ollama running locally (default: <http://localhost:11434>)
- Optional: `OLLAMA_BASE_URL` to override endpoint
- Features: Full JSON streaming, token usage tracking, privacy (local execution)
- Note: No API key required - runs entirely on your machine

## Setup

### 1. Set API Key

**For Claude:**

```bash
export ANTHROPIC_API_KEY=your-api-key-here
```

**For OpenAI:**

```bash
export OPENAI_API_KEY=your-api-key-here
```

**For Gemini:**

```bash
export GEMINI_API_KEY=your-api-key-here
# Or use GOOGLE_API_KEY
export GOOGLE_API_KEY=your-api-key-here
```text

**For Ollama (local models):**
```bash
# Install and start Ollama first
# Download from: https://ollama.ai
ollama serve  # Start the Ollama server

# Pull a model (in another terminal)
ollama pull llama2

# Optional: Override default URL
export OLLAMA_BASE_URL=http://localhost:11434
```text

### 2. Create Agent MDX File

```markdown
---
@agent {
  role: assistant,
  llm: claude-3-5-haiku-20241022
}
---

Hello {{ name }}! How can I help you today?
```text

### 3. Execute Agent

```rust
use typedialog_ag_core::{MarkupParser, NickelTranspiler, NickelEvaluator, AgentExecutor};
use std::collections::HashMap;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Parse MDX
    let parser = MarkupParser::new();
    let ast = parser.parse(mdx_content)?;

    // Transpile to Nickel
    let transpiler = NickelTranspiler::new();
    let nickel_code = transpiler.transpile(&ast)?;

    // Evaluate to AgentDefinition
    let evaluator = NickelEvaluator::new();
    let agent_def = evaluator.evaluate(&nickel_code)?;

    // Execute with LLM
    let executor = AgentExecutor::new();
    let mut inputs = HashMap::new();
    inputs.insert("name".to_string(), serde_json::json!("Alice"));

    let result = executor.execute(&agent_def, inputs).await?;
    println!("Response: {}", result.output);
    println!("Tokens: {}", result.metadata.tokens.unwrap_or(0));

    Ok(())
}
```text

## Configuration

Agent configuration is specified in the MDX frontmatter:

```yaml
@agent {
  role: creative writer,      # System prompt role
  llm: claude-3-5-haiku-20241022,  # Model name
  tools: [],                   # Tool calling (future)
  max_tokens: 4096,           # Optional (default: 4096)
  temperature: 0.7            # Optional (default: 0.7)
}
```text

## LLM Provider Architecture

### Provider Trait

```rust
#[async_trait]
pub trait LlmProvider: Send + Sync {
    async fn complete(&self, request: LlmRequest) -> Result<LlmResponse>;
    fn name(&self) -> &str;
}
```text

### Request/Response

```rust
pub struct LlmRequest {
    pub model: String,
    pub messages: Vec<LlmMessage>,
    pub max_tokens: Option<usize>,
    pub temperature: Option<f64>,
    pub system: Option<String>,
}

pub struct LlmResponse {
    pub content: String,
    pub model: String,
    pub usage: Option<TokenUsage>,
}
```text

### Automatic Provider Selection

The executor automatically selects the correct provider based on model name:

- `claude-*`, `anthropic-*` → ClaudeProvider
- `gpt-*`, `o1-*`, `o3-*`, `o4-*` → OpenAIProvider
- `gemini-*` → GeminiProvider
- `llama*`, `mistral*`, `phi*`, `codellama*`, `mixtral*`, `qwen*`, etc. → OllamaProvider

## Examples

### Run Complete Pipeline

```bash
cargo run --example llm_execution
```text

### Compare All Providers

```bash
# Run all four providers with the same prompt
cargo run --example provider_comparison

# Run specific provider only
cargo run --example provider_comparison claude
cargo run --example provider_comparison openai
cargo run --example provider_comparison gemini
cargo run --example provider_comparison ollama
```text

### Run with Test (requires API key)

```bash
cargo test --package typedialog-ag-core -- test_execute_with_real_llm --exact --ignored --nocapture
```text

### Integration Test

```bash
cargo test --package typedialog-ag-core --test simple_integration_test -- test_complete_pipeline_with_llm --exact --ignored --nocapture
```text

## Error Handling

```rust
match executor.execute(&agent_def, inputs).await {
    Ok(result) => {
        if !result.validation_passed {
            eprintln!("Validation errors: {:?}", result.validation_errors);
        }
        println!("Output: {}", result.output);
    }
    Err(e) => {
        if e.to_string().contains("ANTHROPIC_API_KEY") {
            eprintln!("Error: API key not set");
        } else {
            eprintln!("Execution failed: {}", e);
        }
    }
}
```text

## Token Usage Tracking

All LLM responses include token usage information:

```rust
let result = executor.execute(&agent_def, inputs).await?;

if let Some(tokens) = result.metadata.tokens {
    println!("Tokens used: {}", tokens);
}

if let Some(usage) = response.usage {
    println!("Input tokens: {}", usage.input_tokens);
    println!("Output tokens: {}", usage.output_tokens);
    println!("Total tokens: {}", usage.total_tokens);
}
```text

## Context Injection

Agents can load context from files, URLs, and shell commands before LLM execution:

```markdown
---
@agent {
  role: code reviewer,
  llm: claude-3-5-sonnet-20241022
}

@import "./src/**/*.rs" as source_code
@shell "git diff HEAD~1" as recent_changes

---

Review the following code:

**Source Code:**
{{ source_code }}

**Recent Changes:**
{{ recent_changes }}

Provide security and performance analysis.
```text

The executor loads all context before calling the LLM, so the model receives the fully rendered prompt with all imported content.

## Validation

Output validation runs automatically after LLM execution:

```markdown
---
@validate output {
  must_contain: ["Security", "Performance"],
  format: markdown,
  min_length: 100
}
---
```text

Validation results are included in `ExecutionResult`:

```rust
if !result.validation_passed {
    for error in result.validation_errors {
        eprintln!("Validation error: {}", error);
    }
}
```text

## Cost Optimization

### Use Appropriate Models

- `claude-3-5-haiku-20241022`: Fast, cheap, good for simple tasks
- `claude-3-5-sonnet-20241022`: Balanced performance and cost
- `claude-opus-4`: Most capable, highest cost

### Limit Token Usage

```rust
agent_def.config.max_tokens = 500;  // Limit response length
```text

### Cache Context

The executor supports context caching to avoid re-loading files on each execution (implementation varies by provider).

## Testing Without API Key

Tests that require real LLM execution are marked with `#[ignore]`:

```bash
# Run only non-LLM tests
cargo test --package typedialog-ag-core

# Run LLM tests (requires ANTHROPIC_API_KEY)
cargo test --package typedialog-ag-core -- --ignored
```text

## Implementation Files

- **Provider Trait**: `src/llm/provider.rs`
- **Claude Client**: `src/llm/claude.rs`
- **OpenAI Client**: `src/llm/openai.rs`
- **Gemini Client**: `src/llm/gemini.rs`
- **Ollama Client**: `src/llm/ollama.rs`
- **Executor Integration**: `src/executor/mod.rs`
- **Example**: `examples/llm_execution.rs`
- **Multi-Provider Demo**: `examples/provider_comparison.rs`
- **Tests**: `tests/simple_integration_test.rs`

## Streaming Support

All four providers (Claude, OpenAI, Gemini, Ollama) support real-time streaming:

```rust
use typedialog_ag_core::AgentExecutor;

let executor = AgentExecutor::new();
let result = executor.execute_streaming(&agent_def, inputs, |chunk| {
    print!("{}", chunk);
    std::io::stdout().flush().unwrap();
}).await?;

println!("\n\nFinal output: {}", result.output);
println!("Tokens: {:?}", result.metadata.tokens);
```text

The CLI automatically uses streaming for real-time token display.

### Token Usage in Streaming

- **Claude**: ✅ Provides token usage in stream (via `message_delta` event)
- **OpenAI**: ❌ No token usage in stream (API limitation - only in non-streaming mode)
- **Gemini**: ✅ Provides token usage in stream (via `usageMetadata` in final chunk)
- **Ollama**: ✅ Provides token usage in stream (via `prompt_eval_count`/`eval_count` in done event)
chore: fix md lint 2026-01-11 22:35:49 +00:00			`# LLM Integration`

			`TypeAgent Core now includes full LLM execution capabilities, allowing agents to call real language models.`

			`## Supported Providers`

			`### Claude (Anthropic)`

			`- ✅ Fully supported with streaming`
			- Models: `claude-3-5-haiku-20241022`, `claude-3-5-sonnet-20241022`, `claude-opus-4`, etc.
			- Requires: `ANTHROPIC_API_KEY` environment variable
			`- Features: Full SSE streaming, token usage tracking`

			`### OpenAI`

			`- ✅ Fully supported with streaming`
			- Models: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `o1`, `o3`, `o4-mini`, etc.
			- Requires: `OPENAI_API_KEY` environment variable
			`- Features: Full SSE streaming, token usage tracking`

			`### Google Gemini`

			`- ✅ Fully supported with streaming`
			- Models: `gemini-2.0-flash-exp`, `gemini-1.5-pro`, `gemini-1.5-flash`, etc.
			- Requires: `GEMINI_API_KEY` or `GOOGLE_API_KEY` environment variable
			`- Features: Full JSON streaming, token usage tracking`
			`- Note: Assistant role is mapped to "model" in Gemini API`

			`### Ollama (Local Models)`

			`- ✅ Fully supported with streaming`
			- Models: `llama2`, `mistral`, `phi`, `codellama`, `mixtral`, `qwen`, etc.
			`- Requires: Ollama running locally (default: <http://localhost:11434>)`
			- Optional: `OLLAMA_BASE_URL` to override endpoint
			`- Features: Full JSON streaming, token usage tracking, privacy (local execution)`
			`- Note: No API key required - runs entirely on your machine`

			`## Setup`

			`### 1. Set API Key`

			`For Claude:`

			```bash
			`export ANTHROPIC_API_KEY=your-api-key-here`
			```

			`For OpenAI:`

			```bash
			`export OPENAI_API_KEY=your-api-key-here`
			```

			`For Gemini:`

			```bash
			`export GEMINI_API_KEY=your-api-key-here`
			`# Or use GOOGLE_API_KEY`
			`export GOOGLE_API_KEY=your-api-key-here`
			```text

			`For Ollama (local models):`
			```bash
			`# Install and start Ollama first`
			`# Download from: https://ollama.ai`
			`ollama serve # Start the Ollama server`

			`# Pull a model (in another terminal)`
			`ollama pull llama2`

			`# Optional: Override default URL`
			`export OLLAMA_BASE_URL=http://localhost:11434`
			```text

			`### 2. Create Agent MDX File`

			```markdown
			`---`
			`@agent {`
			`role: assistant,`
			`llm: claude-3-5-haiku-20241022`
			`}`
			`---`

			`Hello {{ name }}! How can I help you today?`
			```text

			`### 3. Execute Agent`

			```rust
			`use typedialog_ag_core::{MarkupParser, NickelTranspiler, NickelEvaluator, AgentExecutor};`
			`use std::collections::HashMap;`

			`#[tokio::main]`
			`async fn main() -> Result<(), Box<dyn std::error::Error>> {`
			`// Parse MDX`
			`let parser = MarkupParser::new();`
			`let ast = parser.parse(mdx_content)?;`

			`// Transpile to Nickel`
			`let transpiler = NickelTranspiler::new();`
			`let nickel_code = transpiler.transpile(&ast)?;`

			`// Evaluate to AgentDefinition`
			`let evaluator = NickelEvaluator::new();`
			`let agent_def = evaluator.evaluate(&nickel_code)?;`

			`// Execute with LLM`
			`let executor = AgentExecutor::new();`
			`let mut inputs = HashMap::new();`
			`inputs.insert("name".to_string(), serde_json::json!("Alice"));`

			`let result = executor.execute(&agent_def, inputs).await?;`
			`println!("Response: {}", result.output);`
			`println!("Tokens: {}", result.metadata.tokens.unwrap_or(0));`

			`Ok(())`
			`}`
			```text

			`## Configuration`

			`Agent configuration is specified in the MDX frontmatter:`

			```yaml
			`@agent {`
			`role: creative writer, # System prompt role`
			`llm: claude-3-5-haiku-20241022, # Model name`
			`tools: [], # Tool calling (future)`
			`max_tokens: 4096, # Optional (default: 4096)`
			`temperature: 0.7 # Optional (default: 0.7)`
			`}`
			```text

			`## LLM Provider Architecture`

			`### Provider Trait`

			```rust
			`#[async_trait]`
			`pub trait LlmProvider: Send + Sync {`
			`async fn complete(&self, request: LlmRequest) -> Result<LlmResponse>;`
			`fn name(&self) -> &str;`
			`}`
			```text

			`### Request/Response`

			```rust
			`pub struct LlmRequest {`
			`pub model: String,`
			`pub messages: Vec<LlmMessage>,`
			`pub max_tokens: Option<usize>,`
			`pub temperature: Option<f64>,`
			`pub system: Option<String>,`
			`}`

			`pub struct LlmResponse {`
			`pub content: String,`
			`pub model: String,`
			`pub usage: Option<TokenUsage>,`
			`}`
			```text

			`### Automatic Provider Selection`

			`The executor automatically selects the correct provider based on model name:`

			- `claude-`, `anthropic-` → ClaudeProvider
			- `gpt-`, `o1-`, `o3-`, `o4-` → OpenAIProvider
			- `gemini-*` → GeminiProvider
			- `llama`, `mistral`, `phi`, `codellama`, `mixtral`, `qwen`, etc. → OllamaProvider

			`## Examples`

			`### Run Complete Pipeline`

			```bash
			`cargo run --example llm_execution`
			```text

			`### Compare All Providers`

			```bash
			`# Run all four providers with the same prompt`
			`cargo run --example provider_comparison`

			`# Run specific provider only`
			`cargo run --example provider_comparison claude`
			`cargo run --example provider_comparison openai`
			`cargo run --example provider_comparison gemini`
			`cargo run --example provider_comparison ollama`
			```text

			`### Run with Test (requires API key)`

			```bash
			`cargo test --package typedialog-ag-core -- test_execute_with_real_llm --exact --ignored --nocapture`
			```text

			`### Integration Test`

			```bash
			`cargo test --package typedialog-ag-core --test simple_integration_test -- test_complete_pipeline_with_llm --exact --ignored --nocapture`
			```text

			`## Error Handling`

			```rust
			`match executor.execute(&agent_def, inputs).await {`
			`Ok(result) => {`
			`if !result.validation_passed {`
			`eprintln!("Validation errors: {:?}", result.validation_errors);`
			`}`
			`println!("Output: {}", result.output);`
			`}`
			`Err(e) => {`
			`if e.to_string().contains("ANTHROPIC_API_KEY") {`
			`eprintln!("Error: API key not set");`
			`} else {`
			`eprintln!("Execution failed: {}", e);`
			`}`
			`}`
			`}`
			```text

			`## Token Usage Tracking`

			`All LLM responses include token usage information:`

			```rust
			`let result = executor.execute(&agent_def, inputs).await?;`

			`if let Some(tokens) = result.metadata.tokens {`
			`println!("Tokens used: {}", tokens);`
			`}`

			`if let Some(usage) = response.usage {`
			`println!("Input tokens: {}", usage.input_tokens);`
			`println!("Output tokens: {}", usage.output_tokens);`
			`println!("Total tokens: {}", usage.total_tokens);`
			`}`
			```text

			`## Context Injection`

			`Agents can load context from files, URLs, and shell commands before LLM execution:`

			```markdown
			`---`
			`@agent {`
			`role: code reviewer,`
			`llm: claude-3-5-sonnet-20241022`
			`}`

			`@import "./src/*/.rs" as source_code`
			`@shell "git diff HEAD~1" as recent_changes`

			`---`

			`Review the following code:`

			`Source Code:`
			`{{ source_code }}`

			`Recent Changes:`
			`{{ recent_changes }}`

			`Provide security and performance analysis.`
			```text

			`The executor loads all context before calling the LLM, so the model receives the fully rendered prompt with all imported content.`

			`## Validation`

			`Output validation runs automatically after LLM execution:`

			```markdown
			`---`
			`@validate output {`
			`must_contain: ["Security", "Performance"],`
			`format: markdown,`
			`min_length: 100`
			`}`
			`---`
			```text

			Validation results are included in `ExecutionResult`:

			```rust
			`if !result.validation_passed {`
			`for error in result.validation_errors {`
			`eprintln!("Validation error: {}", error);`
			`}`
			`}`
			```text

			`## Cost Optimization`

			`### Use Appropriate Models`

			- `claude-3-5-haiku-20241022`: Fast, cheap, good for simple tasks
			- `claude-3-5-sonnet-20241022`: Balanced performance and cost
			- `claude-opus-4`: Most capable, highest cost

			`### Limit Token Usage`

			```rust
			`agent_def.config.max_tokens = 500; // Limit response length`
			```text

			`### Cache Context`

			`The executor supports context caching to avoid re-loading files on each execution (implementation varies by provider).`

			`## Testing Without API Key`

			Tests that require real LLM execution are marked with `#[ignore]`:

			```bash
			`# Run only non-LLM tests`
			`cargo test --package typedialog-ag-core`

			`# Run LLM tests (requires ANTHROPIC_API_KEY)`
			`cargo test --package typedialog-ag-core -- --ignored`
			```text

			`## Implementation Files`

			- Provider Trait: `src/llm/provider.rs`
			- Claude Client: `src/llm/claude.rs`
			- OpenAI Client: `src/llm/openai.rs`
			- Gemini Client: `src/llm/gemini.rs`
			- Ollama Client: `src/llm/ollama.rs`
			- Executor Integration: `src/executor/mod.rs`
			- Example: `examples/llm_execution.rs`
			- Multi-Provider Demo: `examples/provider_comparison.rs`
			- Tests: `tests/simple_integration_test.rs`

			`## Streaming Support`

			`All four providers (Claude, OpenAI, Gemini, Ollama) support real-time streaming:`

			```rust
			`use typedialog_ag_core::AgentExecutor;`

			`let executor = AgentExecutor::new();`
			`let result = executor.execute_streaming(&agent_def, inputs, \|chunk\| {`
			`print!("{}", chunk);`
			`std::io::stdout().flush().unwrap();`
			`}).await?;`

			`println!("\n\nFinal output: {}", result.output);`
			`println!("Tokens: {:?}", result.metadata.tokens);`
			```text

			`The CLI automatically uses streaming for real-time token display.`

			`### Token Usage in Streaming`

			- Claude: ✅ Provides token usage in stream (via `message_delta` event)
			`- OpenAI: ❌ No token usage in stream (API limitation - only in non-streaming mode)`
			- Gemini: ✅ Provides token usage in stream (via `usageMetadata` in final chunk)
			- Ollama: ✅ Provides token usage in stream (via `prompt_eval_count`/`eval_count` in done event)