# LLM Integration TypeAgent Core now includes full LLM execution capabilities, allowing agents to call real language models. ## Supported Providers ### Claude (Anthropic) - ✅ Fully supported with streaming - Models: `claude-3-5-haiku-20241022`, `claude-3-5-sonnet-20241022`, `claude-opus-4`, etc. - Requires: `ANTHROPIC_API_KEY` environment variable - Features: Full SSE streaming, token usage tracking ### OpenAI - ✅ Fully supported with streaming - Models: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `o1`, `o3`, `o4-mini`, etc. - Requires: `OPENAI_API_KEY` environment variable - Features: Full SSE streaming, token usage tracking ### Google Gemini - ✅ Fully supported with streaming - Models: `gemini-2.0-flash-exp`, `gemini-1.5-pro`, `gemini-1.5-flash`, etc. - Requires: `GEMINI_API_KEY` or `GOOGLE_API_KEY` environment variable - Features: Full JSON streaming, token usage tracking - Note: Assistant role is mapped to "model" in Gemini API ### Ollama (Local Models) - ✅ Fully supported with streaming - Models: `llama2`, `mistral`, `phi`, `codellama`, `mixtral`, `qwen`, etc. - Requires: Ollama running locally (default: ) - Optional: `OLLAMA_BASE_URL` to override endpoint - Features: Full JSON streaming, token usage tracking, privacy (local execution) - Note: No API key required - runs entirely on your machine ## Setup ### 1. Set API Key **For Claude:** ```bash export ANTHROPIC_API_KEY=your-api-key-here ``` **For OpenAI:** ```bash export OPENAI_API_KEY=your-api-key-here ``` **For Gemini:** ```bash export GEMINI_API_KEY=your-api-key-here # Or use GOOGLE_API_KEY export GOOGLE_API_KEY=your-api-key-here ```text **For Ollama (local models):** ```bash # Install and start Ollama first # Download from: https://ollama.ai ollama serve # Start the Ollama server # Pull a model (in another terminal) ollama pull llama2 # Optional: Override default URL export OLLAMA_BASE_URL=http://localhost:11434 ```text ### 2. Create Agent MDX File ```markdown --- @agent { role: assistant, llm: claude-3-5-haiku-20241022 } --- Hello {{ name }}! How can I help you today? ```text ### 3. Execute Agent ```rust use typedialog_ag_core::{MarkupParser, NickelTranspiler, NickelEvaluator, AgentExecutor}; use std::collections::HashMap; #[tokio::main] async fn main() -> Result<(), Box> { // Parse MDX let parser = MarkupParser::new(); let ast = parser.parse(mdx_content)?; // Transpile to Nickel let transpiler = NickelTranspiler::new(); let nickel_code = transpiler.transpile(&ast)?; // Evaluate to AgentDefinition let evaluator = NickelEvaluator::new(); let agent_def = evaluator.evaluate(&nickel_code)?; // Execute with LLM let executor = AgentExecutor::new(); let mut inputs = HashMap::new(); inputs.insert("name".to_string(), serde_json::json!("Alice")); let result = executor.execute(&agent_def, inputs).await?; println!("Response: {}", result.output); println!("Tokens: {}", result.metadata.tokens.unwrap_or(0)); Ok(()) } ```text ## Configuration Agent configuration is specified in the MDX frontmatter: ```yaml @agent { role: creative writer, # System prompt role llm: claude-3-5-haiku-20241022, # Model name tools: [], # Tool calling (future) max_tokens: 4096, # Optional (default: 4096) temperature: 0.7 # Optional (default: 0.7) } ```text ## LLM Provider Architecture ### Provider Trait ```rust #[async_trait] pub trait LlmProvider: Send + Sync { async fn complete(&self, request: LlmRequest) -> Result; fn name(&self) -> &str; } ```text ### Request/Response ```rust pub struct LlmRequest { pub model: String, pub messages: Vec, pub max_tokens: Option, pub temperature: Option, pub system: Option, } pub struct LlmResponse { pub content: String, pub model: String, pub usage: Option, } ```text ### Automatic Provider Selection The executor automatically selects the correct provider based on model name: - `claude-*`, `anthropic-*` → ClaudeProvider - `gpt-*`, `o1-*`, `o3-*`, `o4-*` → OpenAIProvider - `gemini-*` → GeminiProvider - `llama*`, `mistral*`, `phi*`, `codellama*`, `mixtral*`, `qwen*`, etc. → OllamaProvider ## Examples ### Run Complete Pipeline ```bash cargo run --example llm_execution ```text ### Compare All Providers ```bash # Run all four providers with the same prompt cargo run --example provider_comparison # Run specific provider only cargo run --example provider_comparison claude cargo run --example provider_comparison openai cargo run --example provider_comparison gemini cargo run --example provider_comparison ollama ```text ### Run with Test (requires API key) ```bash cargo test --package typedialog-ag-core -- test_execute_with_real_llm --exact --ignored --nocapture ```text ### Integration Test ```bash cargo test --package typedialog-ag-core --test simple_integration_test -- test_complete_pipeline_with_llm --exact --ignored --nocapture ```text ## Error Handling ```rust match executor.execute(&agent_def, inputs).await { Ok(result) => { if !result.validation_passed { eprintln!("Validation errors: {:?}", result.validation_errors); } println!("Output: {}", result.output); } Err(e) => { if e.to_string().contains("ANTHROPIC_API_KEY") { eprintln!("Error: API key not set"); } else { eprintln!("Execution failed: {}", e); } } } ```text ## Token Usage Tracking All LLM responses include token usage information: ```rust let result = executor.execute(&agent_def, inputs).await?; if let Some(tokens) = result.metadata.tokens { println!("Tokens used: {}", tokens); } if let Some(usage) = response.usage { println!("Input tokens: {}", usage.input_tokens); println!("Output tokens: {}", usage.output_tokens); println!("Total tokens: {}", usage.total_tokens); } ```text ## Context Injection Agents can load context from files, URLs, and shell commands before LLM execution: ```markdown --- @agent { role: code reviewer, llm: claude-3-5-sonnet-20241022 } @import "./src/**/*.rs" as source_code @shell "git diff HEAD~1" as recent_changes --- Review the following code: **Source Code:** {{ source_code }} **Recent Changes:** {{ recent_changes }} Provide security and performance analysis. ```text The executor loads all context before calling the LLM, so the model receives the fully rendered prompt with all imported content. ## Validation Output validation runs automatically after LLM execution: ```markdown --- @validate output { must_contain: ["Security", "Performance"], format: markdown, min_length: 100 } --- ```text Validation results are included in `ExecutionResult`: ```rust if !result.validation_passed { for error in result.validation_errors { eprintln!("Validation error: {}", error); } } ```text ## Cost Optimization ### Use Appropriate Models - `claude-3-5-haiku-20241022`: Fast, cheap, good for simple tasks - `claude-3-5-sonnet-20241022`: Balanced performance and cost - `claude-opus-4`: Most capable, highest cost ### Limit Token Usage ```rust agent_def.config.max_tokens = 500; // Limit response length ```text ### Cache Context The executor supports context caching to avoid re-loading files on each execution (implementation varies by provider). ## Testing Without API Key Tests that require real LLM execution are marked with `#[ignore]`: ```bash # Run only non-LLM tests cargo test --package typedialog-ag-core # Run LLM tests (requires ANTHROPIC_API_KEY) cargo test --package typedialog-ag-core -- --ignored ```text ## Implementation Files - **Provider Trait**: `src/llm/provider.rs` - **Claude Client**: `src/llm/claude.rs` - **OpenAI Client**: `src/llm/openai.rs` - **Gemini Client**: `src/llm/gemini.rs` - **Ollama Client**: `src/llm/ollama.rs` - **Executor Integration**: `src/executor/mod.rs` - **Example**: `examples/llm_execution.rs` - **Multi-Provider Demo**: `examples/provider_comparison.rs` - **Tests**: `tests/simple_integration_test.rs` ## Streaming Support All four providers (Claude, OpenAI, Gemini, Ollama) support real-time streaming: ```rust use typedialog_ag_core::AgentExecutor; let executor = AgentExecutor::new(); let result = executor.execute_streaming(&agent_def, inputs, |chunk| { print!("{}", chunk); std::io::stdout().flush().unwrap(); }).await?; println!("\n\nFinal output: {}", result.output); println!("Tokens: {:?}", result.metadata.tokens); ```text The CLI automatically uses streaming for real-time token display. ### Token Usage in Streaming - **Claude**: ✅ Provides token usage in stream (via `message_delta` event) - **OpenAI**: ❌ No token usage in stream (API limitation - only in non-streaming mode) - **Gemini**: ✅ Provides token usage in stream (via `usageMetadata` in final chunk) - **Ollama**: ✅ Provides token usage in stream (via `prompt_eval_count`/`eval_count` in done event)