# πŸ” RAG Integration ## Retrievable Augmented Generation for VAPORA Context **Version**: 0.1.0 **Status**: Specification (VAPORA v1.0 Integration) **Purpose**: RAG system from provisioning integrated into VAPORA for semantic search --- ## 🎯 Objetivo **RAG (Retrieval-Augmented Generation)** proporciona contexto a los agentes: - βœ… Agentes buscan documentaciΓ³n semΓ‘nticamente similar - βœ… ADRs, diseΓ±os, y guΓ­as como contexto para nuevas tareas - βœ… Query LLM con documentaciΓ³n relevante - βœ… Reducir alucinaciones, mejorar decisiones - βœ… Sistema completo de provisioning (2,140 lΓ­neas Rust) --- ## πŸ—οΈ RAG Architecture ### Components (From Provisioning) ``` RAG System (2,140 lines, production-ready from provisioning) β”œβ”€ Chunking Engine β”‚ β”œβ”€ Markdown chunks (with metadata) β”‚ β”œβ”€ KCL chunks (for infrastructure docs) β”‚ β”œβ”€ Nushell chunks (for scripts) β”‚ └─ Smart splitting (at headers, code blocks) β”‚ β”œβ”€ Embeddings β”‚ β”œβ”€ Primary: OpenAI API (text-embedding-3-small) β”‚ β”œβ”€ Fallback: Local ONNX (nomic-embed-text) β”‚ β”œβ”€ Dimension: 1536-dim vectors β”‚ └─ Batch processing β”‚ β”œβ”€ Vector Store β”‚ β”œβ”€ SurrealDB with HNSW index β”‚ β”œβ”€ Fast similarity search β”‚ β”œβ”€ Scalar product distance metric β”‚ └─ Replication for redundancy β”‚ β”œβ”€ Retrieval β”‚ β”œβ”€ Top-K BM25 + semantic hybrid β”‚ β”œβ”€ Threshold filtering (relevance > 0.7) β”‚ β”œβ”€ Context enrichment β”‚ └─ Ranking/re-ranking β”‚ └─ Integration β”œβ”€ Claude API with full context β”œβ”€ Agent Search tool β”œβ”€ Workflow context injection └─ Decision-making support ``` ### Data Flow ``` Document Added to docs/ ↓ doc-lifecycle-manager classifies ↓ RAG Chunking Engine β”œβ”€ Split into semantic chunks └─ Extract metadata (title, type, date) ↓ Embeddings Generator β”œβ”€ Generate 1536-dim vector per chunk └─ Batch process for efficiency ↓ Vector Store (SurrealDB HNSW) β”œβ”€ Store chunk + vector + metadata └─ Create HNSW index ↓ Search Ready β”œβ”€ Agent can query β”œβ”€ Semantic similarity search └─ Fast < 100ms latency ``` --- ## πŸ”§ RAG in VAPORA ### Search Tool (Available to All Agents) ```rust pub struct SearchTool { pub vector_store: SurrealDB, pub embeddings: EmbeddingsClient, pub retriever: HybridRetriever, } impl SearchTool { pub async fn search( &self, query: String, top_k: u32, threshold: f64, ) -> anyhow::Result { // 1. Embed query let query_vector = self.embeddings.embed(&query).await?; // 2. Search vector store let chunk_results = self.vector_store.search_hnsw( query_vector, top_k, threshold, ).await?; // 3. Enrich with context let results = self.enrich_results(chunk_results).await?; Ok(SearchResults { query, results, total_chunks_searched: 1000+, search_duration_ms: 45, }) } pub async fn search_with_filters( &self, query: String, filters: SearchFilters, ) -> anyhow::Result { // Filter by document type, date, tags before search let filtered_documents = self.filter_documents(&filters).await?; // ... rest of search } } pub struct SearchFilters { pub doc_type: Option>, // ["adr", "guide"] pub date_range: Option<(Date, Date)>, pub tags: Option>, // ["orchestrator", "performance"] pub lifecycle_state: Option, // "published", "archived" } pub struct SearchResults { pub query: String, pub results: Vec, pub total_chunks_searched: u32, pub search_duration_ms: u32, } pub struct SearchResult { pub document_id: String, pub document_title: String, pub chunk_text: String, pub relevance_score: f64, // 0.0-1.0 pub metadata: HashMap, pub source_url: String, pub snippet_context: String, // Surrounding text } ``` ### Agent Usage Example ```rust // Agent decides to search for context impl DeveloperAgent { pub async fn implement_feature( &mut self, task: Task, ) -> anyhow::Result<()> { // 1. Search for similar features implemented before let similar_features = self.search_tool.search( format!("implement {} feature like {}", task.domain, task.type_), top_k: 5, threshold: 0.75, ).await?; // 2. Extract context from results let context_docs = similar_features.results .iter() .map(|r| r.chunk_text.clone()) .collect::>(); // 3. Build LLM prompt with context let prompt = format!( "Implement the following feature:\n{}\n\nSimilar features implemented:\n{}", task.description, context_docs.join("\n---\n") ); // 4. Generate code with context let code = self.llm_router.complete(prompt).await?; Ok(()) } } ``` ### Documenter Agent Integration ```rust impl DocumenterAgent { pub async fn update_documentation( &mut self, task: Task, ) -> anyhow::Result<()> { // 1. Get decisions from task let decisions = task.extract_decisions().await?; for decision in decisions { // 2. Search existing ADRs to avoid duplicates let similar_adrs = self.search_tool.search( decision.context.clone(), top_k: 3, threshold: 0.8, ).await?; // 3. Check if decision already documented if similar_adrs.results.is_empty() { // Create new ADR let adr_content = format!( "# {}\n\n## Context\n{}\n\n## Decision\n{}", decision.title, decision.context, decision.chosen_option, ); // 4. Save and index for RAG self.db.save_adr(&adr_content).await?; self.rag_system.index_document(&adr_content).await?; } } Ok(()) } } ``` --- ## πŸ“Š RAG Implementation (From Provisioning) ### Schema (SurrealDB) ```sql -- RAG chunks table CREATE TABLE rag_chunks SCHEMAFULL { -- Identifiers id: string, document_id: string, chunk_index: int, -- Content text: string, title: string, doc_type: string, -- Vector embedding: vector<1536>, -- Metadata created_date: datetime, last_updated: datetime, source_path: string, tags: array, lifecycle_state: string, -- Indexing INDEX embedding ON HNSW (1536) FIELDS embedding DISTANCE SCALAR PRODUCT M 16 EF_CONSTRUCTION 200, PERMISSIONS FOR select ALLOW (true) FOR create ALLOW (true) FOR update ALLOW (false) FOR delete ALLOW (false) }; ``` ### Chunking Strategy ```rust pub struct ChunkingEngine; impl ChunkingEngine { pub async fn chunk_document( &self, document: Document, ) -> anyhow::Result> { let chunks = match document.file_type { FileType::Markdown => self.chunk_markdown(&document.content)?, FileType::KCL => self.chunk_kcl(&document.content)?, FileType::Nushell => self.chunk_nushell(&document.content)?, _ => self.chunk_text(&document.content)?, }; Ok(chunks) } fn chunk_markdown(&self, content: &str) -> anyhow::Result> { let mut chunks = Vec::new(); // Split by headers let sections = content.split(|line: &str| line.starts_with('#')); for section in sections { // Max 500 tokens per chunk if section.len() > 500 { // Split further for sub_chunk in section.chunks(400) { chunks.push(Chunk { text: sub_chunk.to_string(), metadata: Default::default(), }); } } else { chunks.push(Chunk { text: section.to_string(), metadata: Default::default(), }); } } Ok(chunks) } } ``` ### Embeddings ```rust pub enum EmbeddingsProvider { OpenAI { api_key: String, model: "text-embedding-3-small", // 1536 dims, fast }, Local { model_path: String, // ONNX model model: "nomic-embed-text", }, } pub struct EmbeddingsClient { provider: EmbeddingsProvider, } impl EmbeddingsClient { pub async fn embed(&self, text: &str) -> anyhow::Result> { match &self.provider { EmbeddingsProvider::OpenAI { api_key, .. } => { // Call OpenAI API let response = reqwest::Client::new() .post("https://api.openai.com/v1/embeddings") .bearer_auth(api_key) .json(&serde_json::json!({ "model": "text-embedding-3-small", "input": text, })) .send() .await?; let result: OpenAIResponse = response.json().await?; Ok(result.data[0].embedding.clone()) }, EmbeddingsProvider::Local { model_path, .. } => { // Use local ONNX model (nomic-embed-text) let session = ort::Session::builder()?.commit_from_file(model_path)?; let output = session.run(ort::inputs![text]?)?; let embedding = output[0].try_extract_tensor()?.view().to_owned(); Ok(embedding.iter().map(|x| *x as f32).collect()) }, } } pub async fn embed_batch( &self, texts: Vec, ) -> anyhow::Result>> { // Batch embed for efficiency // (Use batching API for OpenAI, etc.) } } ``` ### Retrieval ```rust pub struct HybridRetriever { vector_store: SurrealDB, bm25_index: BM25Index, } impl HybridRetriever { pub async fn search( &self, query: String, top_k: u32, ) -> anyhow::Result> { // 1. Semantic search (vector similarity) let query_vector = self.embed(&query).await?; let semantic_results = self.vector_store.search_hnsw( query_vector, top_k * 2, // Get more for re-ranking 0.5, ).await?; // 2. BM25 keyword search let bm25_results = self.bm25_index.search(&query, top_k * 2)?; // 3. Merge and re-rank let mut merged = HashMap::new(); for (i, result) in semantic_results.iter().enumerate() { let score = 1.0 / (i as f64 + 1.0); // Rank-based score merged.entry(result.id.clone()) .and_modify(|s: &mut f64| *s += score * 0.7) // 70% weight .or_insert(score * 0.7); } for (i, result) in bm25_results.iter().enumerate() { let score = 1.0 / (i as f64 + 1.0); merged.entry(result.id.clone()) .and_modify(|s: &mut f64| *s += score * 0.3) // 30% weight .or_insert(score * 0.3); } // 4. Sort and return top-k let mut final_results: Vec<_> = merged.into_iter().collect(); final_results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap()); Ok(final_results.into_iter() .take(top_k as usize) .map(|(id, score)| { // Fetch full chunk with this score ChunkWithScore { id, score } }) .collect()) } } ``` --- ## πŸ“š Indexing Workflow ### Automatic Indexing ``` File added to docs/ ↓ Git hook or workflow trigger ↓ doc-lifecycle-manager processes β”œβ”€ Classifies document └─ Publishes "document_added" event ↓ RAG system subscribes β”œβ”€ Chunks document β”œβ”€ Generates embeddings β”œβ”€ Stores in SurrealDB └─ Updates HNSW index ↓ Agent Search Tool ready ``` ### Batch Reindexing ```bash # Periodic full reindex (daily or on demand) vapora rag reindex --all # Incremental reindex (only changed docs) vapora rag reindex --since 1d # Rebuild HNSW index from scratch vapora rag rebuild-index --optimize ``` --- ## 🎯 Implementation Checklist - [ ] Port RAG system from provisioning (2,140 lines) - [ ] Integrate with SurrealDB vector store - [ ] HNSW index setup + optimization - [ ] Chunking strategies (Markdown, KCL, Nushell) - [ ] Embeddings client (OpenAI + local fallback) - [ ] Hybrid retrieval (semantic + BM25) - [ ] Search tool for agents - [ ] doc-lifecycle-manager hooks - [ ] Indexing workflows - [ ] Batch reindexing - [ ] CLI: `vapora rag search`, `vapora rag reindex` - [ ] Tests + benchmarks --- ## πŸ“Š Success Metrics βœ… Search latency < 100ms (p99) βœ… Relevance score > 0.8 for top results βœ… 1000+ documents indexed βœ… HNSW index memory efficient βœ… Agents find relevant context automatically βœ… No hallucinations from out-of-context queries --- **Version**: 0.1.0 **Status**: βœ… Integration Specification Complete **Purpose**: RAG system for semantic document search in VAPORA