TypeDialog/docs/agent/llm-integration.md

365 lines
8.9 KiB
Markdown
Raw Normal View History

2026-01-11 22:35:49 +00:00
# LLM Integration
TypeAgent Core now includes full LLM execution capabilities, allowing agents to call real language models.
## Supported Providers
### Claude (Anthropic)
- ✅ Fully supported with streaming
- Models: `claude-3-5-haiku-20241022`, `claude-3-5-sonnet-20241022`, `claude-opus-4`, etc.
- Requires: `ANTHROPIC_API_KEY` environment variable
- Features: Full SSE streaming, token usage tracking
### OpenAI
- ✅ Fully supported with streaming
- Models: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `o1`, `o3`, `o4-mini`, etc.
- Requires: `OPENAI_API_KEY` environment variable
- Features: Full SSE streaming, token usage tracking
### Google Gemini
- ✅ Fully supported with streaming
- Models: `gemini-2.0-flash-exp`, `gemini-1.5-pro`, `gemini-1.5-flash`, etc.
- Requires: `GEMINI_API_KEY` or `GOOGLE_API_KEY` environment variable
- Features: Full JSON streaming, token usage tracking
- Note: Assistant role is mapped to "model" in Gemini API
### Ollama (Local Models)
- ✅ Fully supported with streaming
- Models: `llama2`, `mistral`, `phi`, `codellama`, `mixtral`, `qwen`, etc.
- Requires: Ollama running locally (default: <http://localhost:11434>)
- Optional: `OLLAMA_BASE_URL` to override endpoint
- Features: Full JSON streaming, token usage tracking, privacy (local execution)
- Note: No API key required - runs entirely on your machine
## Setup
### 1. Set API Key
**For Claude:**
```bash
export ANTHROPIC_API_KEY=your-api-key-here
```
**For OpenAI:**
```bash
export OPENAI_API_KEY=your-api-key-here
```
**For Gemini:**
```bash
export GEMINI_API_KEY=your-api-key-here
# Or use GOOGLE_API_KEY
export GOOGLE_API_KEY=your-api-key-here
```text
**For Ollama (local models):**
```bash
# Install and start Ollama first
# Download from: https://ollama.ai
ollama serve # Start the Ollama server
# Pull a model (in another terminal)
ollama pull llama2
# Optional: Override default URL
export OLLAMA_BASE_URL=http://localhost:11434
```text
### 2. Create Agent MDX File
```markdown
---
@agent {
role: assistant,
llm: claude-3-5-haiku-20241022
}
---
Hello {{ name }}! How can I help you today?
```text
### 3. Execute Agent
```rust
use typedialog_ag_core::{MarkupParser, NickelTranspiler, NickelEvaluator, AgentExecutor};
use std::collections::HashMap;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Parse MDX
let parser = MarkupParser::new();
let ast = parser.parse(mdx_content)?;
// Transpile to Nickel
let transpiler = NickelTranspiler::new();
let nickel_code = transpiler.transpile(&ast)?;
// Evaluate to AgentDefinition
let evaluator = NickelEvaluator::new();
let agent_def = evaluator.evaluate(&nickel_code)?;
// Execute with LLM
let executor = AgentExecutor::new();
let mut inputs = HashMap::new();
inputs.insert("name".to_string(), serde_json::json!("Alice"));
let result = executor.execute(&agent_def, inputs).await?;
println!("Response: {}", result.output);
println!("Tokens: {}", result.metadata.tokens.unwrap_or(0));
Ok(())
}
```text
## Configuration
Agent configuration is specified in the MDX frontmatter:
```yaml
@agent {
role: creative writer, # System prompt role
llm: claude-3-5-haiku-20241022, # Model name
tools: [], # Tool calling (future)
max_tokens: 4096, # Optional (default: 4096)
temperature: 0.7 # Optional (default: 0.7)
}
```text
## LLM Provider Architecture
### Provider Trait
```rust
#[async_trait]
pub trait LlmProvider: Send + Sync {
async fn complete(&self, request: LlmRequest) -> Result<LlmResponse>;
fn name(&self) -> &str;
}
```text
### Request/Response
```rust
pub struct LlmRequest {
pub model: String,
pub messages: Vec<LlmMessage>,
pub max_tokens: Option<usize>,
pub temperature: Option<f64>,
pub system: Option<String>,
}
pub struct LlmResponse {
pub content: String,
pub model: String,
pub usage: Option<TokenUsage>,
}
```text
### Automatic Provider Selection
The executor automatically selects the correct provider based on model name:
- `claude-*`, `anthropic-*` → ClaudeProvider
- `gpt-*`, `o1-*`, `o3-*`, `o4-*` → OpenAIProvider
- `gemini-*` → GeminiProvider
- `llama*`, `mistral*`, `phi*`, `codellama*`, `mixtral*`, `qwen*`, etc. → OllamaProvider
## Examples
### Run Complete Pipeline
```bash
cargo run --example llm_execution
```text
### Compare All Providers
```bash
# Run all four providers with the same prompt
cargo run --example provider_comparison
# Run specific provider only
cargo run --example provider_comparison claude
cargo run --example provider_comparison openai
cargo run --example provider_comparison gemini
cargo run --example provider_comparison ollama
```text
### Run with Test (requires API key)
```bash
cargo test --package typedialog-ag-core -- test_execute_with_real_llm --exact --ignored --nocapture
```text
### Integration Test
```bash
cargo test --package typedialog-ag-core --test simple_integration_test -- test_complete_pipeline_with_llm --exact --ignored --nocapture
```text
## Error Handling
```rust
match executor.execute(&agent_def, inputs).await {
Ok(result) => {
if !result.validation_passed {
eprintln!("Validation errors: {:?}", result.validation_errors);
}
println!("Output: {}", result.output);
}
Err(e) => {
if e.to_string().contains("ANTHROPIC_API_KEY") {
eprintln!("Error: API key not set");
} else {
eprintln!("Execution failed: {}", e);
}
}
}
```text
## Token Usage Tracking
All LLM responses include token usage information:
```rust
let result = executor.execute(&agent_def, inputs).await?;
if let Some(tokens) = result.metadata.tokens {
println!("Tokens used: {}", tokens);
}
if let Some(usage) = response.usage {
println!("Input tokens: {}", usage.input_tokens);
println!("Output tokens: {}", usage.output_tokens);
println!("Total tokens: {}", usage.total_tokens);
}
```text
## Context Injection
Agents can load context from files, URLs, and shell commands before LLM execution:
```markdown
---
@agent {
role: code reviewer,
llm: claude-3-5-sonnet-20241022
}
@import "./src/**/*.rs" as source_code
@shell "git diff HEAD~1" as recent_changes
---
Review the following code:
**Source Code:**
{{ source_code }}
**Recent Changes:**
{{ recent_changes }}
Provide security and performance analysis.
```text
The executor loads all context before calling the LLM, so the model receives the fully rendered prompt with all imported content.
## Validation
Output validation runs automatically after LLM execution:
```markdown
---
@validate output {
must_contain: ["Security", "Performance"],
format: markdown,
min_length: 100
}
---
```text
Validation results are included in `ExecutionResult`:
```rust
if !result.validation_passed {
for error in result.validation_errors {
eprintln!("Validation error: {}", error);
}
}
```text
## Cost Optimization
### Use Appropriate Models
- `claude-3-5-haiku-20241022`: Fast, cheap, good for simple tasks
- `claude-3-5-sonnet-20241022`: Balanced performance and cost
- `claude-opus-4`: Most capable, highest cost
### Limit Token Usage
```rust
agent_def.config.max_tokens = 500; // Limit response length
```text
### Cache Context
The executor supports context caching to avoid re-loading files on each execution (implementation varies by provider).
## Testing Without API Key
Tests that require real LLM execution are marked with `#[ignore]`:
```bash
# Run only non-LLM tests
cargo test --package typedialog-ag-core
# Run LLM tests (requires ANTHROPIC_API_KEY)
cargo test --package typedialog-ag-core -- --ignored
```text
## Implementation Files
- **Provider Trait**: `src/llm/provider.rs`
- **Claude Client**: `src/llm/claude.rs`
- **OpenAI Client**: `src/llm/openai.rs`
- **Gemini Client**: `src/llm/gemini.rs`
- **Ollama Client**: `src/llm/ollama.rs`
- **Executor Integration**: `src/executor/mod.rs`
- **Example**: `examples/llm_execution.rs`
- **Multi-Provider Demo**: `examples/provider_comparison.rs`
- **Tests**: `tests/simple_integration_test.rs`
## Streaming Support
All four providers (Claude, OpenAI, Gemini, Ollama) support real-time streaming:
```rust
use typedialog_ag_core::AgentExecutor;
let executor = AgentExecutor::new();
let result = executor.execute_streaming(&agent_def, inputs, |chunk| {
print!("{}", chunk);
std::io::stdout().flush().unwrap();
}).await?;
println!("\n\nFinal output: {}", result.output);
println!("Tokens: {:?}", result.metadata.tokens);
```text
The CLI automatically uses streaming for real-time token display.
### Token Usage in Streaming
- **Claude**: ✅ Provides token usage in stream (via `message_delta` event)
- **OpenAI**: ❌ No token usage in stream (API limitation - only in non-streaming mode)
- **Gemini**: ✅ Provides token usage in stream (via `usageMetadata` in final chunk)
- **Ollama**: ✅ Provides token usage in stream (via `prompt_eval_count`/`eval_count` in done event)