π RAG Integration
Retrievable Augmented Generation for VAPORA Context
Version: 0.1.0 Status: Specification (VAPORA v1.0 Integration) Purpose: RAG system from provisioning integrated into VAPORA for semantic search
π― Objetivo
RAG (Retrieval-Augmented Generation) proporciona contexto a los agentes:
- β Agentes buscan documentaciΓ³n semΓ‘nticamente similar
- β ADRs, diseΓ±os, y guΓas como contexto para nuevas tareas
- β Query LLM con documentaciΓ³n relevante
- β Reducir alucinaciones, mejorar decisiones
- β Sistema completo de provisioning (2,140 lΓneas Rust)
ποΈ RAG Architecture
Components (From Provisioning)
RAG System (2,140 lines, production-ready from provisioning)
ββ Chunking Engine
β ββ Markdown chunks (with metadata)
β ββ KCL chunks (for infrastructure docs)
β ββ Nushell chunks (for scripts)
β ββ Smart splitting (at headers, code blocks)
β
ββ Embeddings
β ββ Primary: OpenAI API (text-embedding-3-small)
β ββ Fallback: Local ONNX (nomic-embed-text)
β ββ Dimension: 1536-dim vectors
β ββ Batch processing
β
ββ Vector Store
β ββ SurrealDB with HNSW index
β ββ Fast similarity search
β ββ Scalar product distance metric
β ββ Replication for redundancy
β
ββ Retrieval
β ββ Top-K BM25 + semantic hybrid
β ββ Threshold filtering (relevance > 0.7)
β ββ Context enrichment
β ββ Ranking/re-ranking
β
ββ Integration
ββ Claude API with full context
ββ Agent Search tool
ββ Workflow context injection
ββ Decision-making support
Data Flow
Document Added to docs/
β
doc-lifecycle-manager classifies
β
RAG Chunking Engine
ββ Split into semantic chunks
ββ Extract metadata (title, type, date)
β
Embeddings Generator
ββ Generate 1536-dim vector per chunk
ββ Batch process for efficiency
β
Vector Store (SurrealDB HNSW)
ββ Store chunk + vector + metadata
ββ Create HNSW index
β
Search Ready
ββ Agent can query
ββ Semantic similarity search
ββ Fast < 100ms latency
π§ RAG in VAPORA
Search Tool (Available to All Agents)
#![allow(unused)] fn main() { pub struct SearchTool { pub vector_store: SurrealDB, pub embeddings: EmbeddingsClient, pub retriever: HybridRetriever, } impl SearchTool { pub async fn search( &self, query: String, top_k: u32, threshold: f64, ) -> anyhow::Result<SearchResults> { // 1. Embed query let query_vector = self.embeddings.embed(&query).await?; // 2. Search vector store let chunk_results = self.vector_store.search_hnsw( query_vector, top_k, threshold, ).await?; // 3. Enrich with context let results = self.enrich_results(chunk_results).await?; Ok(SearchResults { query, results, total_chunks_searched: 1000+, search_duration_ms: 45, }) } pub async fn search_with_filters( &self, query: String, filters: SearchFilters, ) -> anyhow::Result<SearchResults> { // Filter by document type, date, tags before search let filtered_documents = self.filter_documents(&filters).await?; // ... rest of search } } pub struct SearchFilters { pub doc_type: Option<Vec<String>>, // ["adr", "guide"] pub date_range: Option<(Date, Date)>, pub tags: Option<Vec<String>>, // ["orchestrator", "performance"] pub lifecycle_state: Option<String>, // "published", "archived" } pub struct SearchResults { pub query: String, pub results: Vec<SearchResult>, pub total_chunks_searched: u32, pub search_duration_ms: u32, } pub struct SearchResult { pub document_id: String, pub document_title: String, pub chunk_text: String, pub relevance_score: f64, // 0.0-1.0 pub metadata: HashMap<String, String>, pub source_url: String, pub snippet_context: String, // Surrounding text } }
Agent Usage Example
#![allow(unused)] fn main() { // Agent decides to search for context impl DeveloperAgent { pub async fn implement_feature( &mut self, task: Task, ) -> anyhow::Result<()> { // 1. Search for similar features implemented before let similar_features = self.search_tool.search( format!("implement {} feature like {}", task.domain, task.type_), top_k: 5, threshold: 0.75, ).await?; // 2. Extract context from results let context_docs = similar_features.results .iter() .map(|r| r.chunk_text.clone()) .collect::<Vec<_>>(); // 3. Build LLM prompt with context let prompt = format!( "Implement the following feature:\n{}\n\nSimilar features implemented:\n{}", task.description, context_docs.join("\n---\n") ); // 4. Generate code with context let code = self.llm_router.complete(prompt).await?; Ok(()) } } }
Documenter Agent Integration
#![allow(unused)] fn main() { impl DocumenterAgent { pub async fn update_documentation( &mut self, task: Task, ) -> anyhow::Result<()> { // 1. Get decisions from task let decisions = task.extract_decisions().await?; for decision in decisions { // 2. Search existing ADRs to avoid duplicates let similar_adrs = self.search_tool.search( decision.context.clone(), top_k: 3, threshold: 0.8, ).await?; // 3. Check if decision already documented if similar_adrs.results.is_empty() { // Create new ADR let adr_content = format!( "# {}\n\n## Context\n{}\n\n## Decision\n{}", decision.title, decision.context, decision.chosen_option, ); // 4. Save and index for RAG self.db.save_adr(&adr_content).await?; self.rag_system.index_document(&adr_content).await?; } } Ok(()) } } }
π RAG Implementation (From Provisioning)
Schema (SurrealDB)
-- RAG chunks table
CREATE TABLE rag_chunks SCHEMAFULL {
-- Identifiers
id: string,
document_id: string,
chunk_index: int,
-- Content
text: string,
title: string,
doc_type: string,
-- Vector
embedding: vector<1536>,
-- Metadata
created_date: datetime,
last_updated: datetime,
source_path: string,
tags: array<string>,
lifecycle_state: string,
-- Indexing
INDEX embedding ON HNSW (1536) FIELDS embedding
DISTANCE SCALAR PRODUCT
M 16
EF_CONSTRUCTION 200,
PERMISSIONS
FOR select ALLOW (true)
FOR create ALLOW (true)
FOR update ALLOW (false)
FOR delete ALLOW (false)
};
Chunking Strategy
#![allow(unused)] fn main() { pub struct ChunkingEngine; impl ChunkingEngine { pub async fn chunk_document( &self, document: Document, ) -> anyhow::Result<Vec<Chunk>> { let chunks = match document.file_type { FileType::Markdown => self.chunk_markdown(&document.content)?, FileType::KCL => self.chunk_kcl(&document.content)?, FileType::Nushell => self.chunk_nushell(&document.content)?, _ => self.chunk_text(&document.content)?, }; Ok(chunks) } fn chunk_markdown(&self, content: &str) -> anyhow::Result<Vec<Chunk>> { let mut chunks = Vec::new(); // Split by headers let sections = content.split(|line: &str| line.starts_with('#')); for section in sections { // Max 500 tokens per chunk if section.len() > 500 { // Split further for sub_chunk in section.chunks(400) { chunks.push(Chunk { text: sub_chunk.to_string(), metadata: Default::default(), }); } } else { chunks.push(Chunk { text: section.to_string(), metadata: Default::default(), }); } } Ok(chunks) } } }
Embeddings
#![allow(unused)] fn main() { pub enum EmbeddingsProvider { OpenAI { api_key: String, model: "text-embedding-3-small", // 1536 dims, fast }, Local { model_path: String, // ONNX model model: "nomic-embed-text", }, } pub struct EmbeddingsClient { provider: EmbeddingsProvider, } impl EmbeddingsClient { pub async fn embed(&self, text: &str) -> anyhow::Result<Vec<f32>> { match &self.provider { EmbeddingsProvider::OpenAI { api_key, .. } => { // Call OpenAI API let response = reqwest::Client::new() .post("https://api.openai.com/v1/embeddings") .bearer_auth(api_key) .json(&serde_json::json!({ "model": "text-embedding-3-small", "input": text, })) .send() .await?; let result: OpenAIResponse = response.json().await?; Ok(result.data[0].embedding.clone()) }, EmbeddingsProvider::Local { model_path, .. } => { // Use local ONNX model (nomic-embed-text) let session = ort::Session::builder()?.commit_from_file(model_path)?; let output = session.run(ort::inputs![text]?)?; let embedding = output[0].try_extract_tensor()?.view().to_owned(); Ok(embedding.iter().map(|x| *x as f32).collect()) }, } } pub async fn embed_batch( &self, texts: Vec<String>, ) -> anyhow::Result<Vec<Vec<f32>>> { // Batch embed for efficiency // (Use batching API for OpenAI, etc.) } } }
Retrieval
#![allow(unused)] fn main() { pub struct HybridRetriever { vector_store: SurrealDB, bm25_index: BM25Index, } impl HybridRetriever { pub async fn search( &self, query: String, top_k: u32, ) -> anyhow::Result<Vec<ChunkWithScore>> { // 1. Semantic search (vector similarity) let query_vector = self.embed(&query).await?; let semantic_results = self.vector_store.search_hnsw( query_vector, top_k * 2, // Get more for re-ranking 0.5, ).await?; // 2. BM25 keyword search let bm25_results = self.bm25_index.search(&query, top_k * 2)?; // 3. Merge and re-rank let mut merged = HashMap::new(); for (i, result) in semantic_results.iter().enumerate() { let score = 1.0 / (i as f64 + 1.0); // Rank-based score merged.entry(result.id.clone()) .and_modify(|s: &mut f64| *s += score * 0.7) // 70% weight .or_insert(score * 0.7); } for (i, result) in bm25_results.iter().enumerate() { let score = 1.0 / (i as f64 + 1.0); merged.entry(result.id.clone()) .and_modify(|s: &mut f64| *s += score * 0.3) // 30% weight .or_insert(score * 0.3); } // 4. Sort and return top-k let mut final_results: Vec<_> = merged.into_iter().collect(); final_results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap()); Ok(final_results.into_iter() .take(top_k as usize) .map(|(id, score)| { // Fetch full chunk with this score ChunkWithScore { id, score } }) .collect()) } } }
π Indexing Workflow
Automatic Indexing
File added to docs/
β
Git hook or workflow trigger
β
doc-lifecycle-manager processes
ββ Classifies document
ββ Publishes "document_added" event
β
RAG system subscribes
ββ Chunks document
ββ Generates embeddings
ββ Stores in SurrealDB
ββ Updates HNSW index
β
Agent Search Tool ready
Batch Reindexing
# Periodic full reindex (daily or on demand)
vapora rag reindex --all
# Incremental reindex (only changed docs)
vapora rag reindex --since 1d
# Rebuild HNSW index from scratch
vapora rag rebuild-index --optimize
π― Implementation Checklist
- Port RAG system from provisioning (2,140 lines)
- Integrate with SurrealDB vector store
- HNSW index setup + optimization
- Chunking strategies (Markdown, KCL, Nushell)
- Embeddings client (OpenAI + local fallback)
- Hybrid retrieval (semantic + BM25)
- Search tool for agents
- doc-lifecycle-manager hooks
- Indexing workflows
- Batch reindexing
-
CLI:
vapora rag search,vapora rag reindex - Tests + benchmarks
π Success Metrics
β Search latency < 100ms (p99) β Relevance score > 0.8 for top results β 1000+ documents indexed β HNSW index memory efficient β Agents find relevant context automatically β No hallucinations from out-of-context queries
Version: 0.1.0 Status: β Integration Specification Complete Purpose: RAG system for semantic document search in VAPORA