365 lines
8.9 KiB
Markdown
365 lines
8.9 KiB
Markdown
|
|
# LLM Integration
|
||
|
|
|
||
|
|
TypeAgent Core now includes full LLM execution capabilities, allowing agents to call real language models.
|
||
|
|
|
||
|
|
## Supported Providers
|
||
|
|
|
||
|
|
### Claude (Anthropic)
|
||
|
|
|
||
|
|
- ✅ Fully supported with streaming
|
||
|
|
- Models: `claude-3-5-haiku-20241022`, `claude-3-5-sonnet-20241022`, `claude-opus-4`, etc.
|
||
|
|
- Requires: `ANTHROPIC_API_KEY` environment variable
|
||
|
|
- Features: Full SSE streaming, token usage tracking
|
||
|
|
|
||
|
|
### OpenAI
|
||
|
|
|
||
|
|
- ✅ Fully supported with streaming
|
||
|
|
- Models: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `o1`, `o3`, `o4-mini`, etc.
|
||
|
|
- Requires: `OPENAI_API_KEY` environment variable
|
||
|
|
- Features: Full SSE streaming, token usage tracking
|
||
|
|
|
||
|
|
### Google Gemini
|
||
|
|
|
||
|
|
- ✅ Fully supported with streaming
|
||
|
|
- Models: `gemini-2.0-flash-exp`, `gemini-1.5-pro`, `gemini-1.5-flash`, etc.
|
||
|
|
- Requires: `GEMINI_API_KEY` or `GOOGLE_API_KEY` environment variable
|
||
|
|
- Features: Full JSON streaming, token usage tracking
|
||
|
|
- Note: Assistant role is mapped to "model" in Gemini API
|
||
|
|
|
||
|
|
### Ollama (Local Models)
|
||
|
|
|
||
|
|
- ✅ Fully supported with streaming
|
||
|
|
- Models: `llama2`, `mistral`, `phi`, `codellama`, `mixtral`, `qwen`, etc.
|
||
|
|
- Requires: Ollama running locally (default: <http://localhost:11434>)
|
||
|
|
- Optional: `OLLAMA_BASE_URL` to override endpoint
|
||
|
|
- Features: Full JSON streaming, token usage tracking, privacy (local execution)
|
||
|
|
- Note: No API key required - runs entirely on your machine
|
||
|
|
|
||
|
|
## Setup
|
||
|
|
|
||
|
|
### 1. Set API Key
|
||
|
|
|
||
|
|
**For Claude:**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
export ANTHROPIC_API_KEY=your-api-key-here
|
||
|
|
```
|
||
|
|
|
||
|
|
**For OpenAI:**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
export OPENAI_API_KEY=your-api-key-here
|
||
|
|
```
|
||
|
|
|
||
|
|
**For Gemini:**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
export GEMINI_API_KEY=your-api-key-here
|
||
|
|
# Or use GOOGLE_API_KEY
|
||
|
|
export GOOGLE_API_KEY=your-api-key-here
|
||
|
|
```text
|
||
|
|
|
||
|
|
**For Ollama (local models):**
|
||
|
|
```bash
|
||
|
|
# Install and start Ollama first
|
||
|
|
# Download from: https://ollama.ai
|
||
|
|
ollama serve # Start the Ollama server
|
||
|
|
|
||
|
|
# Pull a model (in another terminal)
|
||
|
|
ollama pull llama2
|
||
|
|
|
||
|
|
# Optional: Override default URL
|
||
|
|
export OLLAMA_BASE_URL=http://localhost:11434
|
||
|
|
```text
|
||
|
|
|
||
|
|
### 2. Create Agent MDX File
|
||
|
|
|
||
|
|
```markdown
|
||
|
|
---
|
||
|
|
@agent {
|
||
|
|
role: assistant,
|
||
|
|
llm: claude-3-5-haiku-20241022
|
||
|
|
}
|
||
|
|
---
|
||
|
|
|
||
|
|
Hello {{ name }}! How can I help you today?
|
||
|
|
```text
|
||
|
|
|
||
|
|
### 3. Execute Agent
|
||
|
|
|
||
|
|
```rust
|
||
|
|
use typedialog_ag_core::{MarkupParser, NickelTranspiler, NickelEvaluator, AgentExecutor};
|
||
|
|
use std::collections::HashMap;
|
||
|
|
|
||
|
|
#[tokio::main]
|
||
|
|
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||
|
|
// Parse MDX
|
||
|
|
let parser = MarkupParser::new();
|
||
|
|
let ast = parser.parse(mdx_content)?;
|
||
|
|
|
||
|
|
// Transpile to Nickel
|
||
|
|
let transpiler = NickelTranspiler::new();
|
||
|
|
let nickel_code = transpiler.transpile(&ast)?;
|
||
|
|
|
||
|
|
// Evaluate to AgentDefinition
|
||
|
|
let evaluator = NickelEvaluator::new();
|
||
|
|
let agent_def = evaluator.evaluate(&nickel_code)?;
|
||
|
|
|
||
|
|
// Execute with LLM
|
||
|
|
let executor = AgentExecutor::new();
|
||
|
|
let mut inputs = HashMap::new();
|
||
|
|
inputs.insert("name".to_string(), serde_json::json!("Alice"));
|
||
|
|
|
||
|
|
let result = executor.execute(&agent_def, inputs).await?;
|
||
|
|
println!("Response: {}", result.output);
|
||
|
|
println!("Tokens: {}", result.metadata.tokens.unwrap_or(0));
|
||
|
|
|
||
|
|
Ok(())
|
||
|
|
}
|
||
|
|
```text
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
Agent configuration is specified in the MDX frontmatter:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
@agent {
|
||
|
|
role: creative writer, # System prompt role
|
||
|
|
llm: claude-3-5-haiku-20241022, # Model name
|
||
|
|
tools: [], # Tool calling (future)
|
||
|
|
max_tokens: 4096, # Optional (default: 4096)
|
||
|
|
temperature: 0.7 # Optional (default: 0.7)
|
||
|
|
}
|
||
|
|
```text
|
||
|
|
|
||
|
|
## LLM Provider Architecture
|
||
|
|
|
||
|
|
### Provider Trait
|
||
|
|
|
||
|
|
```rust
|
||
|
|
#[async_trait]
|
||
|
|
pub trait LlmProvider: Send + Sync {
|
||
|
|
async fn complete(&self, request: LlmRequest) -> Result<LlmResponse>;
|
||
|
|
fn name(&self) -> &str;
|
||
|
|
}
|
||
|
|
```text
|
||
|
|
|
||
|
|
### Request/Response
|
||
|
|
|
||
|
|
```rust
|
||
|
|
pub struct LlmRequest {
|
||
|
|
pub model: String,
|
||
|
|
pub messages: Vec<LlmMessage>,
|
||
|
|
pub max_tokens: Option<usize>,
|
||
|
|
pub temperature: Option<f64>,
|
||
|
|
pub system: Option<String>,
|
||
|
|
}
|
||
|
|
|
||
|
|
pub struct LlmResponse {
|
||
|
|
pub content: String,
|
||
|
|
pub model: String,
|
||
|
|
pub usage: Option<TokenUsage>,
|
||
|
|
}
|
||
|
|
```text
|
||
|
|
|
||
|
|
### Automatic Provider Selection
|
||
|
|
|
||
|
|
The executor automatically selects the correct provider based on model name:
|
||
|
|
|
||
|
|
- `claude-*`, `anthropic-*` → ClaudeProvider
|
||
|
|
- `gpt-*`, `o1-*`, `o3-*`, `o4-*` → OpenAIProvider
|
||
|
|
- `gemini-*` → GeminiProvider
|
||
|
|
- `llama*`, `mistral*`, `phi*`, `codellama*`, `mixtral*`, `qwen*`, etc. → OllamaProvider
|
||
|
|
|
||
|
|
## Examples
|
||
|
|
|
||
|
|
### Run Complete Pipeline
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cargo run --example llm_execution
|
||
|
|
```text
|
||
|
|
|
||
|
|
### Compare All Providers
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Run all four providers with the same prompt
|
||
|
|
cargo run --example provider_comparison
|
||
|
|
|
||
|
|
# Run specific provider only
|
||
|
|
cargo run --example provider_comparison claude
|
||
|
|
cargo run --example provider_comparison openai
|
||
|
|
cargo run --example provider_comparison gemini
|
||
|
|
cargo run --example provider_comparison ollama
|
||
|
|
```text
|
||
|
|
|
||
|
|
### Run with Test (requires API key)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cargo test --package typedialog-ag-core -- test_execute_with_real_llm --exact --ignored --nocapture
|
||
|
|
```text
|
||
|
|
|
||
|
|
### Integration Test
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cargo test --package typedialog-ag-core --test simple_integration_test -- test_complete_pipeline_with_llm --exact --ignored --nocapture
|
||
|
|
```text
|
||
|
|
|
||
|
|
## Error Handling
|
||
|
|
|
||
|
|
```rust
|
||
|
|
match executor.execute(&agent_def, inputs).await {
|
||
|
|
Ok(result) => {
|
||
|
|
if !result.validation_passed {
|
||
|
|
eprintln!("Validation errors: {:?}", result.validation_errors);
|
||
|
|
}
|
||
|
|
println!("Output: {}", result.output);
|
||
|
|
}
|
||
|
|
Err(e) => {
|
||
|
|
if e.to_string().contains("ANTHROPIC_API_KEY") {
|
||
|
|
eprintln!("Error: API key not set");
|
||
|
|
} else {
|
||
|
|
eprintln!("Execution failed: {}", e);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```text
|
||
|
|
|
||
|
|
## Token Usage Tracking
|
||
|
|
|
||
|
|
All LLM responses include token usage information:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let result = executor.execute(&agent_def, inputs).await?;
|
||
|
|
|
||
|
|
if let Some(tokens) = result.metadata.tokens {
|
||
|
|
println!("Tokens used: {}", tokens);
|
||
|
|
}
|
||
|
|
|
||
|
|
if let Some(usage) = response.usage {
|
||
|
|
println!("Input tokens: {}", usage.input_tokens);
|
||
|
|
println!("Output tokens: {}", usage.output_tokens);
|
||
|
|
println!("Total tokens: {}", usage.total_tokens);
|
||
|
|
}
|
||
|
|
```text
|
||
|
|
|
||
|
|
## Context Injection
|
||
|
|
|
||
|
|
Agents can load context from files, URLs, and shell commands before LLM execution:
|
||
|
|
|
||
|
|
```markdown
|
||
|
|
---
|
||
|
|
@agent {
|
||
|
|
role: code reviewer,
|
||
|
|
llm: claude-3-5-sonnet-20241022
|
||
|
|
}
|
||
|
|
|
||
|
|
@import "./src/**/*.rs" as source_code
|
||
|
|
@shell "git diff HEAD~1" as recent_changes
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
Review the following code:
|
||
|
|
|
||
|
|
**Source Code:**
|
||
|
|
{{ source_code }}
|
||
|
|
|
||
|
|
**Recent Changes:**
|
||
|
|
{{ recent_changes }}
|
||
|
|
|
||
|
|
Provide security and performance analysis.
|
||
|
|
```text
|
||
|
|
|
||
|
|
The executor loads all context before calling the LLM, so the model receives the fully rendered prompt with all imported content.
|
||
|
|
|
||
|
|
## Validation
|
||
|
|
|
||
|
|
Output validation runs automatically after LLM execution:
|
||
|
|
|
||
|
|
```markdown
|
||
|
|
---
|
||
|
|
@validate output {
|
||
|
|
must_contain: ["Security", "Performance"],
|
||
|
|
format: markdown,
|
||
|
|
min_length: 100
|
||
|
|
}
|
||
|
|
---
|
||
|
|
```text
|
||
|
|
|
||
|
|
Validation results are included in `ExecutionResult`:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
if !result.validation_passed {
|
||
|
|
for error in result.validation_errors {
|
||
|
|
eprintln!("Validation error: {}", error);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```text
|
||
|
|
|
||
|
|
## Cost Optimization
|
||
|
|
|
||
|
|
### Use Appropriate Models
|
||
|
|
|
||
|
|
- `claude-3-5-haiku-20241022`: Fast, cheap, good for simple tasks
|
||
|
|
- `claude-3-5-sonnet-20241022`: Balanced performance and cost
|
||
|
|
- `claude-opus-4`: Most capable, highest cost
|
||
|
|
|
||
|
|
### Limit Token Usage
|
||
|
|
|
||
|
|
```rust
|
||
|
|
agent_def.config.max_tokens = 500; // Limit response length
|
||
|
|
```text
|
||
|
|
|
||
|
|
### Cache Context
|
||
|
|
|
||
|
|
The executor supports context caching to avoid re-loading files on each execution (implementation varies by provider).
|
||
|
|
|
||
|
|
## Testing Without API Key
|
||
|
|
|
||
|
|
Tests that require real LLM execution are marked with `#[ignore]`:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Run only non-LLM tests
|
||
|
|
cargo test --package typedialog-ag-core
|
||
|
|
|
||
|
|
# Run LLM tests (requires ANTHROPIC_API_KEY)
|
||
|
|
cargo test --package typedialog-ag-core -- --ignored
|
||
|
|
```text
|
||
|
|
|
||
|
|
## Implementation Files
|
||
|
|
|
||
|
|
- **Provider Trait**: `src/llm/provider.rs`
|
||
|
|
- **Claude Client**: `src/llm/claude.rs`
|
||
|
|
- **OpenAI Client**: `src/llm/openai.rs`
|
||
|
|
- **Gemini Client**: `src/llm/gemini.rs`
|
||
|
|
- **Ollama Client**: `src/llm/ollama.rs`
|
||
|
|
- **Executor Integration**: `src/executor/mod.rs`
|
||
|
|
- **Example**: `examples/llm_execution.rs`
|
||
|
|
- **Multi-Provider Demo**: `examples/provider_comparison.rs`
|
||
|
|
- **Tests**: `tests/simple_integration_test.rs`
|
||
|
|
|
||
|
|
## Streaming Support
|
||
|
|
|
||
|
|
All four providers (Claude, OpenAI, Gemini, Ollama) support real-time streaming:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
use typedialog_ag_core::AgentExecutor;
|
||
|
|
|
||
|
|
let executor = AgentExecutor::new();
|
||
|
|
let result = executor.execute_streaming(&agent_def, inputs, |chunk| {
|
||
|
|
print!("{}", chunk);
|
||
|
|
std::io::stdout().flush().unwrap();
|
||
|
|
}).await?;
|
||
|
|
|
||
|
|
println!("\n\nFinal output: {}", result.output);
|
||
|
|
println!("Tokens: {:?}", result.metadata.tokens);
|
||
|
|
```text
|
||
|
|
|
||
|
|
The CLI automatically uses streaming for real-time token display.
|
||
|
|
|
||
|
|
### Token Usage in Streaming
|
||
|
|
|
||
|
|
- **Claude**: ✅ Provides token usage in stream (via `message_delta` event)
|
||
|
|
- **OpenAI**: ❌ No token usage in stream (API limitation - only in non-streaming mode)
|
||
|
|
- **Gemini**: ✅ Provides token usage in stream (via `usageMetadata` in final chunk)
|
||
|
|
- **Ollama**: ✅ Provides token usage in stream (via `prompt_eval_count`/`eval_count` in done event)
|