Vapora/docs/architecture/llm-provider-patterns.md
Jesús Pérez b6a4d77421
Some checks failed
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
feat: add Leptos UI library and modularize MCP server
2026-02-14 20:10:55 +00:00

25 KiB

LLM Provider Integration Patterns

Version: 1.0 Status: Guide (Reference Architecture) Last Updated: 2026-02-10


Overview

Four implementation patterns for integrating LLM providers (Claude, OpenAI, Gemini, Ollama) without requiring active API subscriptions during development.

Pattern Dev Cost Test Cost When to Use
1. Mocks $0 $0 Unit/integration tests without real calls
2. SDK Direct $varies $0 (mocked) Full SDK integration, actual API calls in staging
3. Add Provider $0 $0 Extending router with new provider
4. End-to-End $0-$$ $varies Complete flow from request → response

Pattern 1: Mocks for Development & Testing

Use case: Develop without API keys. All tests pass without real provider calls.

Structure

// crates/vapora-llm-router/src/mocks.rs

pub struct MockLLMClient {
    name: String,
    responses: Vec<String>,
    call_count: Arc<AtomicUsize>,
}

impl MockLLMClient {
    pub fn new(name: &str) -> Self {
        Self {
            name: name.to_string(),
            responses: vec![
                "Mock response for architecture".into(),
                "Mock response for code review".into(),
                "Mock response for documentation".into(),
            ],
            call_count: Arc::new(AtomicUsize::new(0)),
        }
    }

    pub fn with_responses(mut self, responses: Vec<String>) -> Self {
        self.responses = responses;
        self
    }

    pub fn call_count(&self) -> usize {
        self.call_count.load(std::sync::atomic::Ordering::SeqCst)
    }
}

#[async_trait::async_trait]
impl LLMClient for MockLLMClient {
    async fn complete(&self, prompt: &str) -> Result<String> {
        let idx = self.call_count.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
        let response = self.responses.get(idx % self.responses.len())
            .cloned()
            .unwrap_or_else(|| format!("Mock response #{}", idx));

        info!("MockLLMClient '{}' responded with: {}", self.name, response);
        Ok(response)
    }

    async fn stream(&self, _prompt: &str) -> Result<BoxStream<String>> {
        let response = "Mock streaming response".to_string();
        let stream = futures::stream::once(async move { response });
        Ok(Box::pin(stream))
    }

    fn cost_per_1k_tokens(&self) -> f64 {
        0.0  // Free in mock
    }

    fn latency_ms(&self) -> u32 {
        1  // Instant in mock
    }

    fn available(&self) -> bool {
        true
    }
}

Usage in Tests

#[cfg(test)]
mod tests {
    use super::*;

    #[tokio::test]
    async fn test_routing_with_mock() {
        let mock_claude = MockLLMClient::new("claude");
        let mock_openai = MockLLMClient::new("openai");

        let mut router = LLMRouter::new();
        router.register_provider("claude", Box::new(mock_claude));
        router.register_provider("openai", Box::new(mock_openai));

        // Route task without API calls
        let result = router.route(
            TaskContext {
                task_type: TaskType::CodeGeneration,
                domain: "backend".into(),
                complexity: Complexity::High,
                quality_requirement: Quality::High,
                latency_required_ms: 5000,
                budget_cents: None,
            },
            None,
        ).await;

        assert!(result.is_ok());
    }

    #[tokio::test]
    async fn test_fallback_chain_with_mocks() {
        // First provider fails, second succeeds
        let mock_failed = MockLLMClient::new("failed");
        let mock_success = MockLLMClient::new("success");

        // Use with actual routing logic
        let response = router.route_with_fallback(
            vec!["failed", "success"],
            "test prompt",
        ).await;

        assert!(response.is_ok());
        assert!(response.unwrap().contains("success"));
    }

    #[tokio::test]
    async fn test_cost_tracking_with_mocks() {
        let router = LLMRouter::with_mocks();

        // Simulate 100 tasks
        for i in 0..100 {
            router.route_task(Task {
                id: format!("task-{}", i),
                task_type: TaskType::CodeGeneration,
                ..Default::default()
            }).await.ok();
        }

        // Verify cost tracking (should be $0 for mocks)
        assert_eq!(router.cost_tracker.total_cost_cents, 0);
    }
}

Cost Management with Mocks

pub struct MockCostTracker {
    simulated_cost: AtomicU32,  // Cents
}

impl MockCostTracker {
    pub fn new() -> Self {
        Self {
            simulated_cost: AtomicU32::new(0),
        }
    }

    /// Simulate cost without actual API call
    pub async fn record_simulated_call(&self, provider: &str, tokens: u32) {
        let cost_per_token = match provider {
            "claude" => 0.000003,      // $3 per 1M tokens
            "openai" => 0.000001,      // $1 per 1M tokens
            "gemini" => 0.0000005,     // $0.50 per 1M tokens
            "ollama" => 0.0,           // Free
            _ => 0.0,
        };

        let total_cost = (tokens as f64 * cost_per_token * 100.0) as u32; // cents
        self.simulated_cost.fetch_add(total_cost, Ordering::SeqCst);
    }
}

Benefits

Zero API costs Instant responses (no network delay) Deterministic output (for testing) No authentication needed Full test coverage without subscriptions

Limitations

No real model behavior Responses are hardcoded Can't test actual provider failures Not suitable for production validation


Pattern 2: SDK Direct Integration

Use case: Full integration with official SDKs. Cost tracking and real API calls in staging/production.

Abstraction Layer

// crates/vapora-llm-router/src/providers.rs

use anthropic::Anthropic;  // Official SDK
use openai_api::OpenAI;    // Official SDK

pub trait LLMClient: Send + Sync {
    async fn complete(&self, prompt: &str) -> Result<String>;
    async fn stream(&self, prompt: &str) -> Result<BoxStream<String>>;
    fn cost_per_1k_tokens(&self) -> f64;
    fn available(&self) -> bool;
}

/// Claude SDK Implementation
pub struct ClaudeClient {
    client: Anthropic,
    model: String,
    max_tokens: usize,
}

impl ClaudeClient {
    pub fn new(api_key: &str, model: &str) -> Self {
        Self {
            client: Anthropic::new(api_key.into()),
            model: model.to_string(),
            max_tokens: 4096,
        }
    }
}

#[async_trait::async_trait]
impl LLMClient for ClaudeClient {
    async fn complete(&self, prompt: &str) -> Result<String> {
        let message = self.client
            .messages()
            .create(CreateMessageRequest {
                model: self.model.clone(),
                max_tokens: self.max_tokens,
                messages: vec![
                    MessageParam::User(
                        ContentBlockParam::Text(TextBlockParam {
                            text: prompt.into(),
                        })
                    ),
                ],
                system: None,
                tools: None,
                ..Default::default()
            })
            .await
            .map_err(|e| anyhow!("Claude API error: {}", e))?;

        extract_text_from_response(&message)
    }

    async fn stream(&self, prompt: &str) -> Result<BoxStream<String>> {
        let mut stream = self.client
            .messages()
            .stream(CreateMessageRequest {
                model: self.model.clone(),
                max_tokens: self.max_tokens,
                messages: vec![
                    MessageParam::User(ContentBlockParam::Text(
                        TextBlockParam { text: prompt.into() }
                    )),
                ],
                ..Default::default()
            })
            .await
            .map_err(|e| anyhow!("Claude streaming error: {}", e))?;

        let (tx, rx) = tokio::sync::mpsc::channel(100);

        tokio::spawn(async move {
            while let Some(event) = stream.next().await {
                match event {
                    Ok(evt) => {
                        if let Some(text) = extract_delta(&evt) {
                            let _ = tx.send(text).await;
                        }
                    }
                    Err(e) => {
                        eprintln!("Stream error: {}", e);
                        break;
                    }
                }
            }
        });

        Ok(Box::pin(ReceiverStream::new(rx).map(|s| s)))
    }

    fn cost_per_1k_tokens(&self) -> f64 {
        // Claude Opus: $3/1M input, $15/1M output
        0.015  // Average
    }

    fn available(&self) -> bool {
        !self.client.api_key().is_empty()
    }
}

/// OpenAI SDK Implementation
pub struct OpenAIClient {
    client: OpenAI,
    model: String,
}

impl OpenAIClient {
    pub fn new(api_key: &str, model: &str) -> Self {
        Self {
            client: OpenAI::new(api_key.into()),
            model: model.to_string(),
        }
    }
}

#[async_trait::async_trait]
impl LLMClient for OpenAIClient {
    async fn complete(&self, prompt: &str) -> Result<String> {
        let response = self.client
            .chat_completions()
            .create(CreateChatCompletionRequest {
                model: self.model.clone(),
                messages: vec![
                    ChatCompletionRequestMessage::User(
                        ChatCompletionRequestUserMessage {
                            content: ChatCompletionContentPart::Text(
                                ChatCompletionContentPartText {
                                    text: prompt.into(),
                                }
                            ),
                            name: None,
                        }
                    ),
                ],
                temperature: Some(0.7),
                max_tokens: Some(2048),
                ..Default::default()
            })
            .await
            .map_err(|e| anyhow!("OpenAI API error: {}", e))?;

        Ok(response.choices[0].message.content.clone())
    }

    async fn stream(&self, prompt: &str) -> Result<BoxStream<String>> {
        // Similar to Claude streaming
        todo!("Implement OpenAI streaming")
    }

    fn cost_per_1k_tokens(&self) -> f64 {
        // GPT-4: $10/1M input, $30/1M output
        0.030  // Average
    }

    fn available(&self) -> bool {
        !self.client.api_key().is_empty()
    }
}

Conditional Compilation

// Cargo.toml
[features]
default = ["mock-providers"]
real-providers = ["anthropic", "openai-api", "google-generativeai"]
development = ["mock-providers"]
production = ["real-providers"]

// src/lib.rs
#[cfg(feature = "mock-providers")]
mod mocks;

#[cfg(feature = "real-providers")]
mod claude_client;

#[cfg(feature = "real-providers")]
mod openai_client;

pub fn create_provider(name: &str) -> Box<dyn LLMClient> {
    #[cfg(feature = "real-providers")]
    {
        match name {
            "claude" => Box::new(ClaudeClient::new(
                &env::var("ANTHROPIC_API_KEY").unwrap_or_default(),
                "claude-opus-4",
            )),
            "openai" => Box::new(OpenAIClient::new(
                &env::var("OPENAI_API_KEY").unwrap_or_default(),
                "gpt-4",
            )),
            _ => Box::new(MockLLMClient::new(name)),
        }
    }

    #[cfg(not(feature = "real-providers"))]
    {
        Box::new(MockLLMClient::new(name))
    }
}

Cost Management

pub struct SDKCostTracker {
    provider_costs: DashMap<String, CostMetric>,
}

pub struct CostMetric {
    total_tokens: u64,
    total_cost_cents: u32,
    call_count: u32,
    last_call: DateTime<Utc>,
}

impl SDKCostTracker {
    pub async fn track_call(
        &self,
        provider: &str,
        input_tokens: u32,
        output_tokens: u32,
        cost: f64,
    ) {
        let cost_cents = (cost * 100.0) as u32;
        let total_tokens = (input_tokens + output_tokens) as u64;

        self.provider_costs
            .entry(provider.to_string())
            .or_insert_with(|| CostMetric {
                total_tokens: 0,
                total_cost_cents: 0,
                call_count: 0,
                last_call: Utc::now(),
            })
            .alter(|_, mut metric| {
                metric.total_tokens += total_tokens;
                metric.total_cost_cents += cost_cents;
                metric.call_count += 1;
                metric.last_call = Utc::now();
                metric
            });
    }

    pub fn cost_summary(&self) -> CostSummary {
        let mut total_cost = 0u32;
        let mut providers = Vec::new();

        for entry in self.provider_costs.iter() {
            let metric = entry.value();
            total_cost += metric.total_cost_cents;
            providers.push((
                entry.key().clone(),
                metric.total_cost_cents,
                metric.call_count,
            ));
        }

        CostSummary {
            total_cost_cents: total_cost,
            total_cost_dollars: total_cost as f64 / 100.0,
            providers,
        }
    }
}

Benefits

Real SDK behavior Actual streaming support Token counting from real API Accurate cost calculation Production-ready

Limitations

Requires active API key subscriptions Real API calls consume quota/credits Network latency in tests More complex error handling


Pattern 3: Adding a New Provider

Use case: Integrate a new LLM provider (e.g., Anthropic's new model, custom API).

Step-by-Step

// Step 1: Define provider struct
pub struct CustomLLMClient {
    endpoint: String,
    api_key: String,
    model: String,
}

impl CustomLLMClient {
    pub fn new(endpoint: &str, api_key: &str, model: &str) -> Self {
        Self {
            endpoint: endpoint.to_string(),
            api_key: api_key.to_string(),
            model: model.to_string(),
        }
    }
}

// Step 2: Implement LLMClient trait
#[async_trait::async_trait]
impl LLMClient for CustomLLMClient {
    async fn complete(&self, prompt: &str) -> Result<String> {
        let client = reqwest::Client::new();
        let response = client
            .post(&format!("{}/v1/complete", self.endpoint))
            .header("Authorization", format!("Bearer {}", self.api_key))
            .json(&json!({
                "model": self.model,
                "prompt": prompt,
            }))
            .send()
            .await?;

        if response.status().is_success() {
            let data: serde_json::Value = response.json().await?;
            Ok(data["result"].as_str().unwrap_or("").to_string())
        } else {
            Err(anyhow!("API error: {}", response.status()))
        }
    }

    async fn stream(&self, prompt: &str) -> Result<BoxStream<String>> {
        // Implement streaming with reqwest::Client stream
        todo!("Implement streaming for custom provider")
    }

    fn cost_per_1k_tokens(&self) -> f64 {
        0.01  // Define custom pricing
    }

    fn available(&self) -> bool {
        !self.api_key.is_empty()
    }
}

// Step 3: Register in router factory
pub fn create_provider(name: &str) -> Result<Box<dyn LLMClient>> {
    match name {
        "claude" => Ok(Box::new(ClaudeClient::new(
            &env::var("ANTHROPIC_API_KEY")?,
            "claude-opus-4",
        ))),
        "openai" => Ok(Box::new(OpenAIClient::new(
            &env::var("OPENAI_API_KEY")?,
            "gpt-4",
        ))),
        "custom" => Ok(Box::new(CustomLLMClient::new(
            &env::var("CUSTOM_ENDPOINT")?,
            &env::var("CUSTOM_API_KEY")?,
            &env::var("CUSTOM_MODEL")?,
        ))),
        _ => Err(anyhow!("Unknown provider: {}", name)),
    }
}

// Step 4: Add configuration
#[derive(Deserialize)]
pub struct ProviderConfig {
    pub name: String,
    pub endpoint: Option<String>,
    pub api_key_env: String,
    pub model: String,
    pub cost_per_1k_tokens: f64,
    pub timeout_ms: u32,
}

impl ProviderConfig {
    pub fn to_client(&self) -> Result<Box<dyn LLMClient>> {
        match self.name.as_str() {
            "custom" => Ok(Box::new(CustomLLMClient::new(
                self.endpoint.as_deref().unwrap_or("http://localhost:8000"),
                &env::var(&self.api_key_env)?,
                &self.model,
            ))),
            _ => Err(anyhow!("Unsupported provider type")),
        }
    }
}

// Step 5: Update router
pub struct LLMRouter {
    providers: DashMap<String, Box<dyn LLMClient>>,
    config: Vec<ProviderConfig>,
}

impl LLMRouter {
    pub async fn load_from_config(config: &[ProviderConfig]) -> Result<Self> {
        let mut providers = DashMap::new();

        for provider_config in config {
            let client = provider_config.to_client()?;
            providers.insert(provider_config.name.clone(), client);
        }

        Ok(Self {
            providers,
            config: config.to_vec(),
        })
    }

    pub fn register_provider(
        &self,
        name: &str,
        client: Box<dyn LLMClient>,
    ) {
        self.providers.insert(name.to_string(), client);
    }
}

Configuration Example

# llm-providers.toml
[[providers]]
name = "claude"
api_key_env = "ANTHROPIC_API_KEY"
model = "claude-opus-4"
cost_per_1k_tokens = 0.015
timeout_ms = 30000

[[providers]]
name = "openai"
api_key_env = "OPENAI_API_KEY"
model = "gpt-4"
cost_per_1k_tokens = 0.030
timeout_ms = 30000

[[providers]]
name = "custom"
endpoint = "https://api.custom-provider.com"
api_key_env = "CUSTOM_API_KEY"
model = "custom-model-v1"
cost_per_1k_tokens = 0.005
timeout_ms = 20000

Benefits

Extensible design Easy to add new providers Configuration-driven No code duplication

Limitations

Requires understanding trait implementation Error handling varies per provider Testing multiple providers is complex


Pattern 4: End-to-End Flow

Use case: Complete request → router → provider → response cycle with cost management and fallback.

Full Implementation

// User initiates request
pub struct TaskRequest {
    pub task_id: String,
    pub task_type: TaskType,
    pub prompt: String,
    pub quality_requirement: Quality,
    pub max_cost_cents: Option<u32>,
}

// Router orchestrates end-to-end
pub struct LLMRouterOrchestrator {
    router: LLMRouter,
    cost_tracker: SDKCostTracker,
    metrics: Metrics,
}

impl LLMRouterOrchestrator {
    pub async fn execute(&self, request: TaskRequest) -> Result<TaskResponse> {
        info!("Starting task: {}", request.task_id);

        // 1. Select provider
        let provider_name = self.select_provider(&request).await?;
        let provider = self.router.get_provider(&provider_name)?;

        info!("Selected provider: {} for task {}", provider_name, request.task_id);

        // 2. Check budget
        if let Some(max_cost) = request.max_cost_cents {
            let estimated_cost = provider.cost_per_1k_tokens() * 10.0; // 10k tokens estimate
            if estimated_cost as u32 > max_cost {
                warn!("Budget exceeded. Cost: {} > limit: {}", estimated_cost, max_cost);
                return Err(anyhow!("Budget exceeded"));
            }
        }

        // 3. Execute with timeout
        let timeout = Duration::from_secs(30);
        let result = tokio::time::timeout(
            timeout,
            provider.complete(&request.prompt),
        )
        .await??;

        info!("Task {} completed successfully", request.task_id);

        // 4. Track cost
        let estimated_tokens = (request.prompt.len() / 4) as u32; // Rough estimate
        self.cost_tracker.track_call(
            &provider_name,
            estimated_tokens,
            (result.len() / 4) as u32,
            provider.cost_per_1k_tokens() * 10.0,
        ).await;

        // 5. Record metrics
        self.metrics.record_task_completion(
            &request.task_type,
            &provider_name,
            Duration::from_secs(1),
        );

        Ok(TaskResponse {
            task_id: request.task_id,
            result,
            provider: provider_name,
            cost_cents: Some((provider.cost_per_1k_tokens() * 10.0) as u32),
        })
    }

    async fn select_provider(&self, request: &TaskRequest) -> Result<String> {
        let context = TaskContext {
            task_type: request.task_type.clone(),
            quality_requirement: request.quality_requirement.clone(),
            budget_cents: request.max_cost_cents,
            ..Default::default()
        };

        let provider_name = self.router.route(context, None).await?;
        Ok(provider_name)
    }
}

// Fallback chain handling
pub struct FallbackExecutor {
    router: LLMRouter,
    cost_tracker: SDKCostTracker,
    metrics: Metrics,
}

impl FallbackExecutor {
    pub async fn execute_with_fallback(
        &self,
        request: TaskRequest,
        fallback_chain: Vec<String>,
    ) -> Result<TaskResponse> {
        let mut last_error = None;

        for provider_name in fallback_chain {
            match self.try_provider(&request, &provider_name).await {
                Ok(response) => {
                    info!("Success with provider: {}", provider_name);
                    return Ok(response);
                }
                Err(e) => {
                    warn!(
                        "Provider {} failed: {:?}, trying next",
                        provider_name, e
                    );
                    self.metrics.record_provider_failure(&provider_name);
                    last_error = Some(e);
                }
            }
        }

        Err(last_error.unwrap_or_else(|| anyhow!("All providers failed")))
    }

    async fn try_provider(
        &self,
        request: &TaskRequest,
        provider_name: &str,
    ) -> Result<TaskResponse> {
        let provider = self.router.get_provider(provider_name)?;

        let timeout = Duration::from_secs(30);
        let result = tokio::time::timeout(
            timeout,
            provider.complete(&request.prompt),
        )
        .await??;

        self.cost_tracker.track_call(
            provider_name,
            (request.prompt.len() / 4) as u32,
            (result.len() / 4) as u32,
            provider.cost_per_1k_tokens() * 10.0,
        ).await;

        Ok(TaskResponse {
            task_id: request.task_id.clone(),
            result,
            provider: provider_name.to_string(),
            cost_cents: Some((provider.cost_per_1k_tokens() * 10.0) as u32),
        })
    }
}

Cost Management Integration

pub struct CostManagementPolicy {
    pub daily_limit_cents: u32,
    pub monthly_limit_cents: u32,
    pub per_task_limit_cents: u32,
    pub warn_threshold_percent: f64,
}

impl CostManagementPolicy {
    pub fn check_budget(
        &self,
        current_spend_cents: u32,
        new_call_cost_cents: u32,
    ) -> Result<()> {
        let total = current_spend_cents.saturating_add(new_call_cost_cents);

        if total > self.daily_limit_cents {
            return Err(anyhow!(
                "Daily budget exceeded: {} + {} > {}",
                current_spend_cents,
                new_call_cost_cents,
                self.daily_limit_cents
            ));
        }

        let percent_used = (total as f64 / self.daily_limit_cents as f64) * 100.0;
        if percent_used > self.warn_threshold_percent {
            warn!(
                "Budget warning: {:.1}% of daily limit used",
                percent_used
            );
        }

        Ok(())
    }
}

Benefits

Complete request lifecycle Integrated cost tracking Fallback chain support Metrics collection Budget enforcement Timeout handling

Limitations

Complex orchestration logic Hard to test all edge cases Requires multiple components


Summary & Recommendations

Pattern Dev Test Prod Recommend When
Mocks Building features without costs
SDK Direct Full integration, staging/prod
Add Provider Supporting new provider types
End-to-End Production orchestration

Development Workflow

Local Dev          CI/Tests          Staging          Production
┌─────────────┐  ┌──────────────┐  ┌────────┐  ┌─────────────┐
│   Mocks     │  │ Mocks + SDK  │  │  SDK   │  │ SDK + Real  │
│ Zero cost   │  │ (Simulated)  │  │ Real   │  │ Fallback    │
│ No keys     │  │ Tests only   │  │ Budget │  │ Monitoring  │
└─────────────┘  └──────────────┘  └────────┘  └─────────────┘

See llm-provider-implementation-guide.md for actual VAPORA implementation and code examples.