TypeDialog/docs/agent/llm-integration.md
2026-01-11 22:35:49 +00:00

8.9 KiB

LLM Integration

TypeAgent Core now includes full LLM execution capabilities, allowing agents to call real language models.

Supported Providers

Claude (Anthropic)

  • Fully supported with streaming
  • Models: claude-3-5-haiku-20241022, claude-3-5-sonnet-20241022, claude-opus-4, etc.
  • Requires: ANTHROPIC_API_KEY environment variable
  • Features: Full SSE streaming, token usage tracking

OpenAI

  • Fully supported with streaming
  • Models: gpt-4o, gpt-4o-mini, gpt-4-turbo, o1, o3, o4-mini, etc.
  • Requires: OPENAI_API_KEY environment variable
  • Features: Full SSE streaming, token usage tracking

Google Gemini

  • Fully supported with streaming
  • Models: gemini-2.0-flash-exp, gemini-1.5-pro, gemini-1.5-flash, etc.
  • Requires: GEMINI_API_KEY or GOOGLE_API_KEY environment variable
  • Features: Full JSON streaming, token usage tracking
  • Note: Assistant role is mapped to "model" in Gemini API

Ollama (Local Models)

  • Fully supported with streaming
  • Models: llama2, mistral, phi, codellama, mixtral, qwen, etc.
  • Requires: Ollama running locally (default: http://localhost:11434)
  • Optional: OLLAMA_BASE_URL to override endpoint
  • Features: Full JSON streaming, token usage tracking, privacy (local execution)
  • Note: No API key required - runs entirely on your machine

Setup

1. Set API Key

For Claude:

export ANTHROPIC_API_KEY=your-api-key-here

For OpenAI:

export OPENAI_API_KEY=your-api-key-here

For Gemini:

export GEMINI_API_KEY=your-api-key-here
# Or use GOOGLE_API_KEY
export GOOGLE_API_KEY=your-api-key-here
```text

**For Ollama (local models):**
```bash
# Install and start Ollama first
# Download from: https://ollama.ai
ollama serve  # Start the Ollama server

# Pull a model (in another terminal)
ollama pull llama2

# Optional: Override default URL
export OLLAMA_BASE_URL=http://localhost:11434
```text

### 2. Create Agent MDX File

```markdown
---
@agent {
  role: assistant,
  llm: claude-3-5-haiku-20241022
}
---

Hello {{ name }}! How can I help you today?
```text

### 3. Execute Agent

```rust
use typedialog_ag_core::{MarkupParser, NickelTranspiler, NickelEvaluator, AgentExecutor};
use std::collections::HashMap;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Parse MDX
    let parser = MarkupParser::new();
    let ast = parser.parse(mdx_content)?;

    // Transpile to Nickel
    let transpiler = NickelTranspiler::new();
    let nickel_code = transpiler.transpile(&ast)?;

    // Evaluate to AgentDefinition
    let evaluator = NickelEvaluator::new();
    let agent_def = evaluator.evaluate(&nickel_code)?;

    // Execute with LLM
    let executor = AgentExecutor::new();
    let mut inputs = HashMap::new();
    inputs.insert("name".to_string(), serde_json::json!("Alice"));

    let result = executor.execute(&agent_def, inputs).await?;
    println!("Response: {}", result.output);
    println!("Tokens: {}", result.metadata.tokens.unwrap_or(0));

    Ok(())
}
```text

## Configuration

Agent configuration is specified in the MDX frontmatter:

```yaml
@agent {
  role: creative writer,      # System prompt role
  llm: claude-3-5-haiku-20241022,  # Model name
  tools: [],                   # Tool calling (future)
  max_tokens: 4096,           # Optional (default: 4096)
  temperature: 0.7            # Optional (default: 0.7)
}
```text

## LLM Provider Architecture

### Provider Trait

```rust
#[async_trait]
pub trait LlmProvider: Send + Sync {
    async fn complete(&self, request: LlmRequest) -> Result<LlmResponse>;
    fn name(&self) -> &str;
}
```text

### Request/Response

```rust
pub struct LlmRequest {
    pub model: String,
    pub messages: Vec<LlmMessage>,
    pub max_tokens: Option<usize>,
    pub temperature: Option<f64>,
    pub system: Option<String>,
}

pub struct LlmResponse {
    pub content: String,
    pub model: String,
    pub usage: Option<TokenUsage>,
}
```text

### Automatic Provider Selection

The executor automatically selects the correct provider based on model name:

- `claude-*`, `anthropic-*` → ClaudeProvider
- `gpt-*`, `o1-*`, `o3-*`, `o4-*` → OpenAIProvider
- `gemini-*` → GeminiProvider
- `llama*`, `mistral*`, `phi*`, `codellama*`, `mixtral*`, `qwen*`, etc. → OllamaProvider

## Examples

### Run Complete Pipeline

```bash
cargo run --example llm_execution
```text

### Compare All Providers

```bash
# Run all four providers with the same prompt
cargo run --example provider_comparison

# Run specific provider only
cargo run --example provider_comparison claude
cargo run --example provider_comparison openai
cargo run --example provider_comparison gemini
cargo run --example provider_comparison ollama
```text

### Run with Test (requires API key)

```bash
cargo test --package typedialog-ag-core -- test_execute_with_real_llm --exact --ignored --nocapture
```text

### Integration Test

```bash
cargo test --package typedialog-ag-core --test simple_integration_test -- test_complete_pipeline_with_llm --exact --ignored --nocapture
```text

## Error Handling

```rust
match executor.execute(&agent_def, inputs).await {
    Ok(result) => {
        if !result.validation_passed {
            eprintln!("Validation errors: {:?}", result.validation_errors);
        }
        println!("Output: {}", result.output);
    }
    Err(e) => {
        if e.to_string().contains("ANTHROPIC_API_KEY") {
            eprintln!("Error: API key not set");
        } else {
            eprintln!("Execution failed: {}", e);
        }
    }
}
```text

## Token Usage Tracking

All LLM responses include token usage information:

```rust
let result = executor.execute(&agent_def, inputs).await?;

if let Some(tokens) = result.metadata.tokens {
    println!("Tokens used: {}", tokens);
}

if let Some(usage) = response.usage {
    println!("Input tokens: {}", usage.input_tokens);
    println!("Output tokens: {}", usage.output_tokens);
    println!("Total tokens: {}", usage.total_tokens);
}
```text

## Context Injection

Agents can load context from files, URLs, and shell commands before LLM execution:

```markdown
---
@agent {
  role: code reviewer,
  llm: claude-3-5-sonnet-20241022
}

@import "./src/**/*.rs" as source_code
@shell "git diff HEAD~1" as recent_changes

---

Review the following code:

**Source Code:**
{{ source_code }}

**Recent Changes:**
{{ recent_changes }}

Provide security and performance analysis.
```text

The executor loads all context before calling the LLM, so the model receives the fully rendered prompt with all imported content.

## Validation

Output validation runs automatically after LLM execution:

```markdown
---
@validate output {
  must_contain: ["Security", "Performance"],
  format: markdown,
  min_length: 100
}
---
```text

Validation results are included in `ExecutionResult`:

```rust
if !result.validation_passed {
    for error in result.validation_errors {
        eprintln!("Validation error: {}", error);
    }
}
```text

## Cost Optimization

### Use Appropriate Models

- `claude-3-5-haiku-20241022`: Fast, cheap, good for simple tasks
- `claude-3-5-sonnet-20241022`: Balanced performance and cost
- `claude-opus-4`: Most capable, highest cost

### Limit Token Usage

```rust
agent_def.config.max_tokens = 500;  // Limit response length
```text

### Cache Context

The executor supports context caching to avoid re-loading files on each execution (implementation varies by provider).

## Testing Without API Key

Tests that require real LLM execution are marked with `#[ignore]`:

```bash
# Run only non-LLM tests
cargo test --package typedialog-ag-core

# Run LLM tests (requires ANTHROPIC_API_KEY)
cargo test --package typedialog-ag-core -- --ignored
```text

## Implementation Files

- **Provider Trait**: `src/llm/provider.rs`
- **Claude Client**: `src/llm/claude.rs`
- **OpenAI Client**: `src/llm/openai.rs`
- **Gemini Client**: `src/llm/gemini.rs`
- **Ollama Client**: `src/llm/ollama.rs`
- **Executor Integration**: `src/executor/mod.rs`
- **Example**: `examples/llm_execution.rs`
- **Multi-Provider Demo**: `examples/provider_comparison.rs`
- **Tests**: `tests/simple_integration_test.rs`

## Streaming Support

All four providers (Claude, OpenAI, Gemini, Ollama) support real-time streaming:

```rust
use typedialog_ag_core::AgentExecutor;

let executor = AgentExecutor::new();
let result = executor.execute_streaming(&agent_def, inputs, |chunk| {
    print!("{}", chunk);
    std::io::stdout().flush().unwrap();
}).await?;

println!("\n\nFinal output: {}", result.output);
println!("Tokens: {:?}", result.metadata.tokens);
```text

The CLI automatically uses streaming for real-time token display.

### Token Usage in Streaming

- **Claude**: ✅ Provides token usage in stream (via `message_delta` event)
- **OpenAI**: ❌ No token usage in stream (API limitation - only in non-streaming mode)
- **Gemini**: ✅ Provides token usage in stream (via `usageMetadata` in final chunk)
- **Ollama**: ✅ Provides token usage in stream (via `prompt_eval_count`/`eval_count` in done event)