# LLM Provider Integration Patterns **Version**: 1.0 **Status**: Guide (Reference Architecture) **Last Updated**: 2026-02-10 --- ## Overview Four implementation patterns for integrating LLM providers (Claude, OpenAI, Gemini, Ollama) without requiring active API subscriptions during development. | Pattern | Dev Cost | Test Cost | When to Use | |---------|----------|-----------|------------| | **1. Mocks** | $0 | $0 | Unit/integration tests without real calls | | **2. SDK Direct** | $varies | $0 (mocked) | Full SDK integration, actual API calls in staging | | **3. Add Provider** | $0 | $0 | Extending router with new provider | | **4. End-to-End** | $0-$$ | $varies | Complete flow from request → response | --- ## Pattern 1: Mocks for Development & Testing **Use case**: Develop without API keys. All tests pass without real provider calls. ### Structure ```rust // crates/vapora-llm-router/src/mocks.rs pub struct MockLLMClient { name: String, responses: Vec, call_count: Arc, } impl MockLLMClient { pub fn new(name: &str) -> Self { Self { name: name.to_string(), responses: vec![ "Mock response for architecture".into(), "Mock response for code review".into(), "Mock response for documentation".into(), ], call_count: Arc::new(AtomicUsize::new(0)), } } pub fn with_responses(mut self, responses: Vec) -> Self { self.responses = responses; self } pub fn call_count(&self) -> usize { self.call_count.load(std::sync::atomic::Ordering::SeqCst) } } #[async_trait::async_trait] impl LLMClient for MockLLMClient { async fn complete(&self, prompt: &str) -> Result { let idx = self.call_count.fetch_add(1, std::sync::atomic::Ordering::SeqCst); let response = self.responses.get(idx % self.responses.len()) .cloned() .unwrap_or_else(|| format!("Mock response #{}", idx)); info!("MockLLMClient '{}' responded with: {}", self.name, response); Ok(response) } async fn stream(&self, _prompt: &str) -> Result> { let response = "Mock streaming response".to_string(); let stream = futures::stream::once(async move { response }); Ok(Box::pin(stream)) } fn cost_per_1k_tokens(&self) -> f64 { 0.0 // Free in mock } fn latency_ms(&self) -> u32 { 1 // Instant in mock } fn available(&self) -> bool { true } } ``` ### Usage in Tests ```rust #[cfg(test)] mod tests { use super::*; #[tokio::test] async fn test_routing_with_mock() { let mock_claude = MockLLMClient::new("claude"); let mock_openai = MockLLMClient::new("openai"); let mut router = LLMRouter::new(); router.register_provider("claude", Box::new(mock_claude)); router.register_provider("openai", Box::new(mock_openai)); // Route task without API calls let result = router.route( TaskContext { task_type: TaskType::CodeGeneration, domain: "backend".into(), complexity: Complexity::High, quality_requirement: Quality::High, latency_required_ms: 5000, budget_cents: None, }, None, ).await; assert!(result.is_ok()); } #[tokio::test] async fn test_fallback_chain_with_mocks() { // First provider fails, second succeeds let mock_failed = MockLLMClient::new("failed"); let mock_success = MockLLMClient::new("success"); // Use with actual routing logic let response = router.route_with_fallback( vec!["failed", "success"], "test prompt", ).await; assert!(response.is_ok()); assert!(response.unwrap().contains("success")); } #[tokio::test] async fn test_cost_tracking_with_mocks() { let router = LLMRouter::with_mocks(); // Simulate 100 tasks for i in 0..100 { router.route_task(Task { id: format!("task-{}", i), task_type: TaskType::CodeGeneration, ..Default::default() }).await.ok(); } // Verify cost tracking (should be $0 for mocks) assert_eq!(router.cost_tracker.total_cost_cents, 0); } } ``` ### Cost Management with Mocks ```rust pub struct MockCostTracker { simulated_cost: AtomicU32, // Cents } impl MockCostTracker { pub fn new() -> Self { Self { simulated_cost: AtomicU32::new(0), } } /// Simulate cost without actual API call pub async fn record_simulated_call(&self, provider: &str, tokens: u32) { let cost_per_token = match provider { "claude" => 0.000003, // $3 per 1M tokens "openai" => 0.000001, // $1 per 1M tokens "gemini" => 0.0000005, // $0.50 per 1M tokens "ollama" => 0.0, // Free _ => 0.0, }; let total_cost = (tokens as f64 * cost_per_token * 100.0) as u32; // cents self.simulated_cost.fetch_add(total_cost, Ordering::SeqCst); } } ``` ### Benefits ✅ Zero API costs ✅ Instant responses (no network delay) ✅ Deterministic output (for testing) ✅ No authentication needed ✅ Full test coverage without subscriptions ### Limitations ❌ No real model behavior ❌ Responses are hardcoded ❌ Can't test actual provider failures ❌ Not suitable for production validation --- ## Pattern 2: SDK Direct Integration **Use case**: Full integration with official SDKs. Cost tracking and real API calls in staging/production. ### Abstraction Layer ```rust // crates/vapora-llm-router/src/providers.rs use anthropic::Anthropic; // Official SDK use openai_api::OpenAI; // Official SDK pub trait LLMClient: Send + Sync { async fn complete(&self, prompt: &str) -> Result; async fn stream(&self, prompt: &str) -> Result>; fn cost_per_1k_tokens(&self) -> f64; fn available(&self) -> bool; } /// Claude SDK Implementation pub struct ClaudeClient { client: Anthropic, model: String, max_tokens: usize, } impl ClaudeClient { pub fn new(api_key: &str, model: &str) -> Self { Self { client: Anthropic::new(api_key.into()), model: model.to_string(), max_tokens: 4096, } } } #[async_trait::async_trait] impl LLMClient for ClaudeClient { async fn complete(&self, prompt: &str) -> Result { let message = self.client .messages() .create(CreateMessageRequest { model: self.model.clone(), max_tokens: self.max_tokens, messages: vec![ MessageParam::User( ContentBlockParam::Text(TextBlockParam { text: prompt.into(), }) ), ], system: None, tools: None, ..Default::default() }) .await .map_err(|e| anyhow!("Claude API error: {}", e))?; extract_text_from_response(&message) } async fn stream(&self, prompt: &str) -> Result> { let mut stream = self.client .messages() .stream(CreateMessageRequest { model: self.model.clone(), max_tokens: self.max_tokens, messages: vec![ MessageParam::User(ContentBlockParam::Text( TextBlockParam { text: prompt.into() } )), ], ..Default::default() }) .await .map_err(|e| anyhow!("Claude streaming error: {}", e))?; let (tx, rx) = tokio::sync::mpsc::channel(100); tokio::spawn(async move { while let Some(event) = stream.next().await { match event { Ok(evt) => { if let Some(text) = extract_delta(&evt) { let _ = tx.send(text).await; } } Err(e) => { eprintln!("Stream error: {}", e); break; } } } }); Ok(Box::pin(ReceiverStream::new(rx).map(|s| s))) } fn cost_per_1k_tokens(&self) -> f64 { // Claude Opus: $3/1M input, $15/1M output 0.015 // Average } fn available(&self) -> bool { !self.client.api_key().is_empty() } } /// OpenAI SDK Implementation pub struct OpenAIClient { client: OpenAI, model: String, } impl OpenAIClient { pub fn new(api_key: &str, model: &str) -> Self { Self { client: OpenAI::new(api_key.into()), model: model.to_string(), } } } #[async_trait::async_trait] impl LLMClient for OpenAIClient { async fn complete(&self, prompt: &str) -> Result { let response = self.client .chat_completions() .create(CreateChatCompletionRequest { model: self.model.clone(), messages: vec![ ChatCompletionRequestMessage::User( ChatCompletionRequestUserMessage { content: ChatCompletionContentPart::Text( ChatCompletionContentPartText { text: prompt.into(), } ), name: None, } ), ], temperature: Some(0.7), max_tokens: Some(2048), ..Default::default() }) .await .map_err(|e| anyhow!("OpenAI API error: {}", e))?; Ok(response.choices[0].message.content.clone()) } async fn stream(&self, prompt: &str) -> Result> { // Similar to Claude streaming todo!("Implement OpenAI streaming") } fn cost_per_1k_tokens(&self) -> f64 { // GPT-4: $10/1M input, $30/1M output 0.030 // Average } fn available(&self) -> bool { !self.client.api_key().is_empty() } } ``` ### Conditional Compilation ```rust // Cargo.toml [features] default = ["mock-providers"] real-providers = ["anthropic", "openai-api", "google-generativeai"] development = ["mock-providers"] production = ["real-providers"] // src/lib.rs #[cfg(feature = "mock-providers")] mod mocks; #[cfg(feature = "real-providers")] mod claude_client; #[cfg(feature = "real-providers")] mod openai_client; pub fn create_provider(name: &str) -> Box { #[cfg(feature = "real-providers")] { match name { "claude" => Box::new(ClaudeClient::new( &env::var("ANTHROPIC_API_KEY").unwrap_or_default(), "claude-opus-4", )), "openai" => Box::new(OpenAIClient::new( &env::var("OPENAI_API_KEY").unwrap_or_default(), "gpt-4", )), _ => Box::new(MockLLMClient::new(name)), } } #[cfg(not(feature = "real-providers"))] { Box::new(MockLLMClient::new(name)) } } ``` ### Cost Management ```rust pub struct SDKCostTracker { provider_costs: DashMap, } pub struct CostMetric { total_tokens: u64, total_cost_cents: u32, call_count: u32, last_call: DateTime, } impl SDKCostTracker { pub async fn track_call( &self, provider: &str, input_tokens: u32, output_tokens: u32, cost: f64, ) { let cost_cents = (cost * 100.0) as u32; let total_tokens = (input_tokens + output_tokens) as u64; self.provider_costs .entry(provider.to_string()) .or_insert_with(|| CostMetric { total_tokens: 0, total_cost_cents: 0, call_count: 0, last_call: Utc::now(), }) .alter(|_, mut metric| { metric.total_tokens += total_tokens; metric.total_cost_cents += cost_cents; metric.call_count += 1; metric.last_call = Utc::now(); metric }); } pub fn cost_summary(&self) -> CostSummary { let mut total_cost = 0u32; let mut providers = Vec::new(); for entry in self.provider_costs.iter() { let metric = entry.value(); total_cost += metric.total_cost_cents; providers.push(( entry.key().clone(), metric.total_cost_cents, metric.call_count, )); } CostSummary { total_cost_cents: total_cost, total_cost_dollars: total_cost as f64 / 100.0, providers, } } } ``` ### Benefits ✅ Real SDK behavior ✅ Actual streaming support ✅ Token counting from real API ✅ Accurate cost calculation ✅ Production-ready ### Limitations ❌ Requires active API key subscriptions ❌ Real API calls consume quota/credits ❌ Network latency in tests ❌ More complex error handling --- ## Pattern 3: Adding a New Provider **Use case**: Integrate a new LLM provider (e.g., Anthropic's new model, custom API). ### Step-by-Step ```rust // Step 1: Define provider struct pub struct CustomLLMClient { endpoint: String, api_key: String, model: String, } impl CustomLLMClient { pub fn new(endpoint: &str, api_key: &str, model: &str) -> Self { Self { endpoint: endpoint.to_string(), api_key: api_key.to_string(), model: model.to_string(), } } } // Step 2: Implement LLMClient trait #[async_trait::async_trait] impl LLMClient for CustomLLMClient { async fn complete(&self, prompt: &str) -> Result { let client = reqwest::Client::new(); let response = client .post(&format!("{}/v1/complete", self.endpoint)) .header("Authorization", format!("Bearer {}", self.api_key)) .json(&json!({ "model": self.model, "prompt": prompt, })) .send() .await?; if response.status().is_success() { let data: serde_json::Value = response.json().await?; Ok(data["result"].as_str().unwrap_or("").to_string()) } else { Err(anyhow!("API error: {}", response.status())) } } async fn stream(&self, prompt: &str) -> Result> { // Implement streaming with reqwest::Client stream todo!("Implement streaming for custom provider") } fn cost_per_1k_tokens(&self) -> f64 { 0.01 // Define custom pricing } fn available(&self) -> bool { !self.api_key.is_empty() } } // Step 3: Register in router factory pub fn create_provider(name: &str) -> Result> { match name { "claude" => Ok(Box::new(ClaudeClient::new( &env::var("ANTHROPIC_API_KEY")?, "claude-opus-4", ))), "openai" => Ok(Box::new(OpenAIClient::new( &env::var("OPENAI_API_KEY")?, "gpt-4", ))), "custom" => Ok(Box::new(CustomLLMClient::new( &env::var("CUSTOM_ENDPOINT")?, &env::var("CUSTOM_API_KEY")?, &env::var("CUSTOM_MODEL")?, ))), _ => Err(anyhow!("Unknown provider: {}", name)), } } // Step 4: Add configuration #[derive(Deserialize)] pub struct ProviderConfig { pub name: String, pub endpoint: Option, pub api_key_env: String, pub model: String, pub cost_per_1k_tokens: f64, pub timeout_ms: u32, } impl ProviderConfig { pub fn to_client(&self) -> Result> { match self.name.as_str() { "custom" => Ok(Box::new(CustomLLMClient::new( self.endpoint.as_deref().unwrap_or("http://localhost:8000"), &env::var(&self.api_key_env)?, &self.model, ))), _ => Err(anyhow!("Unsupported provider type")), } } } // Step 5: Update router pub struct LLMRouter { providers: DashMap>, config: Vec, } impl LLMRouter { pub async fn load_from_config(config: &[ProviderConfig]) -> Result { let mut providers = DashMap::new(); for provider_config in config { let client = provider_config.to_client()?; providers.insert(provider_config.name.clone(), client); } Ok(Self { providers, config: config.to_vec(), }) } pub fn register_provider( &self, name: &str, client: Box, ) { self.providers.insert(name.to_string(), client); } } ``` ### Configuration Example ```toml # llm-providers.toml [[providers]] name = "claude" api_key_env = "ANTHROPIC_API_KEY" model = "claude-opus-4" cost_per_1k_tokens = 0.015 timeout_ms = 30000 [[providers]] name = "openai" api_key_env = "OPENAI_API_KEY" model = "gpt-4" cost_per_1k_tokens = 0.030 timeout_ms = 30000 [[providers]] name = "custom" endpoint = "https://api.custom-provider.com" api_key_env = "CUSTOM_API_KEY" model = "custom-model-v1" cost_per_1k_tokens = 0.005 timeout_ms = 20000 ``` ### Benefits ✅ Extensible design ✅ Easy to add new providers ✅ Configuration-driven ✅ No code duplication ### Limitations ❌ Requires understanding trait implementation ❌ Error handling varies per provider ❌ Testing multiple providers is complex --- ## Pattern 4: End-to-End Flow **Use case**: Complete request → router → provider → response cycle with cost management and fallback. ### Full Implementation ```rust // User initiates request pub struct TaskRequest { pub task_id: String, pub task_type: TaskType, pub prompt: String, pub quality_requirement: Quality, pub max_cost_cents: Option, } // Router orchestrates end-to-end pub struct LLMRouterOrchestrator { router: LLMRouter, cost_tracker: SDKCostTracker, metrics: Metrics, } impl LLMRouterOrchestrator { pub async fn execute(&self, request: TaskRequest) -> Result { info!("Starting task: {}", request.task_id); // 1. Select provider let provider_name = self.select_provider(&request).await?; let provider = self.router.get_provider(&provider_name)?; info!("Selected provider: {} for task {}", provider_name, request.task_id); // 2. Check budget if let Some(max_cost) = request.max_cost_cents { let estimated_cost = provider.cost_per_1k_tokens() * 10.0; // 10k tokens estimate if estimated_cost as u32 > max_cost { warn!("Budget exceeded. Cost: {} > limit: {}", estimated_cost, max_cost); return Err(anyhow!("Budget exceeded")); } } // 3. Execute with timeout let timeout = Duration::from_secs(30); let result = tokio::time::timeout( timeout, provider.complete(&request.prompt), ) .await??; info!("Task {} completed successfully", request.task_id); // 4. Track cost let estimated_tokens = (request.prompt.len() / 4) as u32; // Rough estimate self.cost_tracker.track_call( &provider_name, estimated_tokens, (result.len() / 4) as u32, provider.cost_per_1k_tokens() * 10.0, ).await; // 5. Record metrics self.metrics.record_task_completion( &request.task_type, &provider_name, Duration::from_secs(1), ); Ok(TaskResponse { task_id: request.task_id, result, provider: provider_name, cost_cents: Some((provider.cost_per_1k_tokens() * 10.0) as u32), }) } async fn select_provider(&self, request: &TaskRequest) -> Result { let context = TaskContext { task_type: request.task_type.clone(), quality_requirement: request.quality_requirement.clone(), budget_cents: request.max_cost_cents, ..Default::default() }; let provider_name = self.router.route(context, None).await?; Ok(provider_name) } } // Fallback chain handling pub struct FallbackExecutor { router: LLMRouter, cost_tracker: SDKCostTracker, metrics: Metrics, } impl FallbackExecutor { pub async fn execute_with_fallback( &self, request: TaskRequest, fallback_chain: Vec, ) -> Result { let mut last_error = None; for provider_name in fallback_chain { match self.try_provider(&request, &provider_name).await { Ok(response) => { info!("Success with provider: {}", provider_name); return Ok(response); } Err(e) => { warn!( "Provider {} failed: {:?}, trying next", provider_name, e ); self.metrics.record_provider_failure(&provider_name); last_error = Some(e); } } } Err(last_error.unwrap_or_else(|| anyhow!("All providers failed"))) } async fn try_provider( &self, request: &TaskRequest, provider_name: &str, ) -> Result { let provider = self.router.get_provider(provider_name)?; let timeout = Duration::from_secs(30); let result = tokio::time::timeout( timeout, provider.complete(&request.prompt), ) .await??; self.cost_tracker.track_call( provider_name, (request.prompt.len() / 4) as u32, (result.len() / 4) as u32, provider.cost_per_1k_tokens() * 10.0, ).await; Ok(TaskResponse { task_id: request.task_id.clone(), result, provider: provider_name.to_string(), cost_cents: Some((provider.cost_per_1k_tokens() * 10.0) as u32), }) } } ``` ### Cost Management Integration ```rust pub struct CostManagementPolicy { pub daily_limit_cents: u32, pub monthly_limit_cents: u32, pub per_task_limit_cents: u32, pub warn_threshold_percent: f64, } impl CostManagementPolicy { pub fn check_budget( &self, current_spend_cents: u32, new_call_cost_cents: u32, ) -> Result<()> { let total = current_spend_cents.saturating_add(new_call_cost_cents); if total > self.daily_limit_cents { return Err(anyhow!( "Daily budget exceeded: {} + {} > {}", current_spend_cents, new_call_cost_cents, self.daily_limit_cents )); } let percent_used = (total as f64 / self.daily_limit_cents as f64) * 100.0; if percent_used > self.warn_threshold_percent { warn!( "Budget warning: {:.1}% of daily limit used", percent_used ); } Ok(()) } } ``` ### Benefits ✅ Complete request lifecycle ✅ Integrated cost tracking ✅ Fallback chain support ✅ Metrics collection ✅ Budget enforcement ✅ Timeout handling ### Limitations ❌ Complex orchestration logic ❌ Hard to test all edge cases ❌ Requires multiple components --- ## Summary & Recommendations | Pattern | Dev | Test | Prod | Recommend When | |---------|-----|------|------|-----------------| | **Mocks** | ⭐⭐⭐ | ⭐⭐⭐ | ❌ | Building features without costs | | **SDK Direct** | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | Full integration, staging/prod | | **Add Provider** | ⭐ | ⭐ | ⭐⭐ | Supporting new provider types | | **End-to-End** | ⭐ | ⭐⭐ | ⭐⭐⭐ | Production orchestration | ### Development Workflow ``` Local Dev CI/Tests Staging Production ┌─────────────┐ ┌──────────────┐ ┌────────┐ ┌─────────────┐ │ Mocks │ │ Mocks + SDK │ │ SDK │ │ SDK + Real │ │ Zero cost │ │ (Simulated) │ │ Real │ │ Fallback │ │ No keys │ │ Tests only │ │ Budget │ │ Monitoring │ └─────────────┘ └──────────────┘ └────────┘ └─────────────┘ ``` See [llm-provider-implementation-guide.md](llm-provider-implementation.md) for actual VAPORA implementation and code examples.