# RLM Production Setup Guide This guide shows how to configure vapora-rlm for production use with LLM clients and embeddings. ## Prerequisites 1. **SurrealDB** running on port 8000 2. **LLM Provider** (choose one): - OpenAI (cloud, requires API key) - Anthropic Claude (cloud, requires API key) - Ollama (local, free) 3. **Optional**: Docker for Docker sandbox tier ## Quick Start ### Option 1: Cloud (OpenAI) ```bash # Set API key export OPENAI_API_KEY="sk-..." # Run example cargo run --example production_setup ``` ### Option 2: Local (Ollama) ```bash # Install and start Ollama brew install ollama ollama serve # Pull model ollama pull llama3.2 # Run example cargo run --example local_ollama ``` ## Production Configuration ### 1. Create RLM Engine with LLM Client ```rust use std::sync::Arc; use vapora_llm_router::providers::OpenAIClient; use vapora_rlm::RLMEngine; // Setup LLM client let llm_client = Arc::new(OpenAIClient::new( api_key, "gpt-4".to_string(), 4096, // max_tokens 0.7, // temperature 5.0, // cost per 1M input tokens 15.0, // cost per 1M output tokens )?); // Create engine with LLM let engine = RLMEngine::with_llm_client( storage, bm25_index, llm_client, Some(config), )?; ``` ### 2. Configure Chunking Strategy ```rust use vapora_rlm::chunking::{ChunkingConfig, ChunkingStrategy}; use vapora_rlm::engine::RLMEngineConfig; let config = RLMEngineConfig { chunking: ChunkingConfig { strategy: ChunkingStrategy::Semantic, // or Fixed, Code chunk_size: 1000, overlap: 200, }, embedding: Some(EmbeddingConfig::openai_small()), auto_rebuild_bm25: true, max_chunks_per_doc: 10_000, }; ``` ### 3. Configure Embeddings ```rust use vapora_rlm::embeddings::EmbeddingConfig; // OpenAI (1536 dimensions) let embedding_config = EmbeddingConfig::openai_small(); // OpenAI (3072 dimensions) let embedding_config = EmbeddingConfig::openai_large(); // Ollama (local) let embedding_config = EmbeddingConfig::ollama("llama3.2"); ``` ### 4. Use RLM in Production ```rust // Load document let chunk_count = engine.load_document(doc_id, content, None).await?; // Query with hybrid search (BM25 + semantic + RRF) let results = engine.query(doc_id, "your query", None, 5).await?; // Dispatch to LLM for distributed reasoning let response = engine .dispatch_subtask(doc_id, "Analyze this code", None, 5) .await?; println!("LLM Response: {}", response.text); println!("Tokens: {} in, {} out", response.total_input_tokens, response.total_output_tokens ); ``` ## LLM Provider Options ### OpenAI ```rust use vapora_llm_router::providers::OpenAIClient; let client = Arc::new(OpenAIClient::new( api_key, "gpt-4".to_string(), 4096, 0.7, 5.0, 15.0, )?); ``` **Models:** - `gpt-4` - Most capable - `gpt-4-turbo` - Faster, cheaper - `gpt-3.5-turbo` - Fast, cheapest ### Anthropic Claude ```rust use vapora_llm_router::providers::ClaudeClient; let client = Arc::new(ClaudeClient::new( api_key, "claude-3-opus-20240229".to_string(), 4096, 0.7, 15.0, 75.0, )?); ``` **Models:** - `claude-3-opus` - Most capable - `claude-3-sonnet` - Balanced - `claude-3-haiku` - Fast, cheap ### Ollama (Local) ```rust use vapora_llm_router::providers::OllamaClient; let client = Arc::new(OllamaClient::new( "http://localhost:11434".to_string(), "llama3.2".to_string(), 4096, 0.7, )?); ``` **Popular models:** - `llama3.2` - Meta's latest - `mistral` - Fast, capable - `codellama` - Code-focused - `mixtral` - Large, powerful ## Performance Tuning ### Chunk Size Optimization ```rust // Small chunks (500 chars) - Better precision, more chunks ChunkingConfig { strategy: ChunkingStrategy::Fixed, chunk_size: 500, overlap: 100, } // Large chunks (2000 chars) - More context, fewer chunks ChunkingConfig { strategy: ChunkingStrategy::Fixed, chunk_size: 2000, overlap: 400, } ``` ### BM25 Index Tuning ```rust let config = RLMEngineConfig { auto_rebuild_bm25: true, // Rebuild after loading ..Default::default() }; ``` ### Max Chunks Per Document ```rust let config = RLMEngineConfig { max_chunks_per_doc: 10_000, // Safety limit ..Default::default() }; ``` ## Production Checklist - [ ] LLM client configured with valid API key - [ ] Embedding provider configured - [ ] SurrealDB schema applied: `bash tests/test_setup.sh` - [ ] Chunking strategy selected (Semantic for prose, Code for code) - [ ] Max chunks per doc set appropriately - [ ] Prometheus metrics endpoint exposed - [ ] Error handling and retries in place - [ ] Cost tracking enabled (for cloud providers) ## Troubleshooting ### "No LLM client configured" ```rust // Don't use RLMEngine::new() - it has no LLM client let engine = RLMEngine::new(storage, bm25_index)?; // ❌ // Use with_llm_client() instead let engine = RLMEngine::with_llm_client( storage, bm25_index, llm_client, Some(config) )?; // ✅ ``` ### "Embedding generation failed" ```rust // Make sure embedding config matches your provider let config = RLMEngineConfig { embedding: Some(EmbeddingConfig::openai_small()), // ✅ ..Default::default() }; ``` ### "SurrealDB schema error" ```bash # Apply the schema cd crates/vapora-rlm/tests bash test_setup.sh ``` ## Examples See `examples/` directory: - `production_setup.rs` - OpenAI production setup - `local_ollama.rs` - Local development with Ollama Run with: ```bash cargo run --example production_setup cargo run --example local_ollama ``` ## Cost Optimization ### Use Local Ollama for Development ```rust // Free, local, no API keys let client = Arc::new(OllamaClient::new( "http://localhost:11434".to_string(), "llama3.2".to_string(), 4096, 0.7, )?); ``` ### Choose Cheaper Models for Production ```rust // Instead of gpt-4 ($5/$15 per 1M tokens) OpenAIClient::new(api_key, "gpt-4".to_string(), ...) // Use gpt-3.5-turbo ($0.50/$1.50 per 1M tokens) OpenAIClient::new(api_key, "gpt-3.5-turbo".to_string(), ...) ``` ### Track Costs with Metrics ```rust // RLM automatically tracks token usage let response = engine.dispatch_subtask(...).await?; println!("Cost: ${:.4}", (response.total_input_tokens as f64 * 5.0 / 1_000_000.0) + (response.total_output_tokens as f64 * 15.0 / 1_000_000.0) ); ``` ## Next Steps 1. Review examples: `cargo run --example local_ollama` 2. Run tests: `cargo test -p vapora-rlm` 3. Check metrics: See `src/metrics.rs` 4. Integrate with backend: See `vapora-backend` integration patterns