Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

πŸ” RAG Integration

Retrievable Augmented Generation for VAPORA Context

Version: 0.1.0 Status: Specification (VAPORA v1.0 Integration) Purpose: RAG system from provisioning integrated into VAPORA for semantic search


🎯 Objetivo

RAG (Retrieval-Augmented Generation) proporciona contexto a los agentes:

  • βœ… Agentes buscan documentaciΓ³n semΓ‘nticamente similar
  • βœ… ADRs, diseΓ±os, y guΓ­as como contexto para nuevas tareas
  • βœ… Query LLM con documentaciΓ³n relevante
  • βœ… Reducir alucinaciones, mejorar decisiones
  • βœ… Sistema completo de provisioning (2,140 lΓ­neas Rust)

πŸ—οΈ RAG Architecture

Components (From Provisioning)

RAG System (2,140 lines, production-ready from provisioning)
β”œβ”€ Chunking Engine
β”‚  β”œβ”€ Markdown chunks (with metadata)
β”‚  β”œβ”€ KCL chunks (for infrastructure docs)
β”‚  β”œβ”€ Nushell chunks (for scripts)
β”‚  └─ Smart splitting (at headers, code blocks)
β”‚
β”œβ”€ Embeddings
β”‚  β”œβ”€ Primary: OpenAI API (text-embedding-3-small)
β”‚  β”œβ”€ Fallback: Local ONNX (nomic-embed-text)
β”‚  β”œβ”€ Dimension: 1536-dim vectors
β”‚  └─ Batch processing
β”‚
β”œβ”€ Vector Store
β”‚  β”œβ”€ SurrealDB with HNSW index
β”‚  β”œβ”€ Fast similarity search
β”‚  β”œβ”€ Scalar product distance metric
β”‚  └─ Replication for redundancy
β”‚
β”œβ”€ Retrieval
β”‚  β”œβ”€ Top-K BM25 + semantic hybrid
β”‚  β”œβ”€ Threshold filtering (relevance > 0.7)
β”‚  β”œβ”€ Context enrichment
β”‚  └─ Ranking/re-ranking
β”‚
└─ Integration
   β”œβ”€ Claude API with full context
   β”œβ”€ Agent Search tool
   β”œβ”€ Workflow context injection
   └─ Decision-making support

Data Flow

Document Added to docs/
  ↓
doc-lifecycle-manager classifies
  ↓
RAG Chunking Engine
  β”œβ”€ Split into semantic chunks
  └─ Extract metadata (title, type, date)
  ↓
Embeddings Generator
  β”œβ”€ Generate 1536-dim vector per chunk
  └─ Batch process for efficiency
  ↓
Vector Store (SurrealDB HNSW)
  β”œβ”€ Store chunk + vector + metadata
  └─ Create HNSW index
  ↓
Search Ready
  β”œβ”€ Agent can query
  β”œβ”€ Semantic similarity search
  └─ Fast < 100ms latency

πŸ”§ RAG in VAPORA

Search Tool (Available to All Agents)

#![allow(unused)]
fn main() {
pub struct SearchTool {
    pub vector_store: SurrealDB,
    pub embeddings: EmbeddingsClient,
    pub retriever: HybridRetriever,
}

impl SearchTool {
    pub async fn search(
        &self,
        query: String,
        top_k: u32,
        threshold: f64,
    ) -> anyhow::Result<SearchResults> {
        // 1. Embed query
        let query_vector = self.embeddings.embed(&query).await?;

        // 2. Search vector store
        let chunk_results = self.vector_store.search_hnsw(
            query_vector,
            top_k,
            threshold,
        ).await?;

        // 3. Enrich with context
        let results = self.enrich_results(chunk_results).await?;

        Ok(SearchResults {
            query,
            results,
            total_chunks_searched: 1000+,
            search_duration_ms: 45,
        })
    }

    pub async fn search_with_filters(
        &self,
        query: String,
        filters: SearchFilters,
    ) -> anyhow::Result<SearchResults> {
        // Filter by document type, date, tags before search
        let filtered_documents = self.filter_documents(&filters).await?;
        // ... rest of search
    }
}

pub struct SearchFilters {
    pub doc_type: Option<Vec<String>>,      // ["adr", "guide"]
    pub date_range: Option<(Date, Date)>,
    pub tags: Option<Vec<String>>,          // ["orchestrator", "performance"]
    pub lifecycle_state: Option<String>,    // "published", "archived"
}

pub struct SearchResults {
    pub query: String,
    pub results: Vec<SearchResult>,
    pub total_chunks_searched: u32,
    pub search_duration_ms: u32,
}

pub struct SearchResult {
    pub document_id: String,
    pub document_title: String,
    pub chunk_text: String,
    pub relevance_score: f64,      // 0.0-1.0
    pub metadata: HashMap<String, String>,
    pub source_url: String,
    pub snippet_context: String,   // Surrounding text
}
}

Agent Usage Example

#![allow(unused)]
fn main() {
// Agent decides to search for context
impl DeveloperAgent {
    pub async fn implement_feature(
        &mut self,
        task: Task,
    ) -> anyhow::Result<()> {
        // 1. Search for similar features implemented before
        let similar_features = self.search_tool.search(
            format!("implement {} feature like {}", task.domain, task.type_),
            top_k: 5,
            threshold: 0.75,
        ).await?;

        // 2. Extract context from results
        let context_docs = similar_features.results
            .iter()
            .map(|r| r.chunk_text.clone())
            .collect::<Vec<_>>();

        // 3. Build LLM prompt with context
        let prompt = format!(
            "Implement the following feature:\n{}\n\nSimilar features implemented:\n{}",
            task.description,
            context_docs.join("\n---\n")
        );

        // 4. Generate code with context
        let code = self.llm_router.complete(prompt).await?;

        Ok(())
    }
}
}

Documenter Agent Integration

#![allow(unused)]
fn main() {
impl DocumenterAgent {
    pub async fn update_documentation(
        &mut self,
        task: Task,
    ) -> anyhow::Result<()> {
        // 1. Get decisions from task
        let decisions = task.extract_decisions().await?;

        for decision in decisions {
            // 2. Search existing ADRs to avoid duplicates
            let similar_adrs = self.search_tool.search(
                decision.context.clone(),
                top_k: 3,
                threshold: 0.8,
            ).await?;

            // 3. Check if decision already documented
            if similar_adrs.results.is_empty() {
                // Create new ADR
                let adr_content = format!(
                    "# {}\n\n## Context\n{}\n\n## Decision\n{}",
                    decision.title,
                    decision.context,
                    decision.chosen_option,
                );

                // 4. Save and index for RAG
                self.db.save_adr(&adr_content).await?;
                self.rag_system.index_document(&adr_content).await?;
            }
        }

        Ok(())
    }
}
}

πŸ“Š RAG Implementation (From Provisioning)

Schema (SurrealDB)

-- RAG chunks table
CREATE TABLE rag_chunks SCHEMAFULL {
    -- Identifiers
    id: string,
    document_id: string,
    chunk_index: int,

    -- Content
    text: string,
    title: string,
    doc_type: string,

    -- Vector
    embedding: vector<1536>,

    -- Metadata
    created_date: datetime,
    last_updated: datetime,
    source_path: string,
    tags: array<string>,
    lifecycle_state: string,

    -- Indexing
    INDEX embedding ON HNSW (1536) FIELDS embedding
        DISTANCE SCALAR PRODUCT
        M 16
        EF_CONSTRUCTION 200,

    PERMISSIONS
        FOR select ALLOW (true)
        FOR create ALLOW (true)
        FOR update ALLOW (false)
        FOR delete ALLOW (false)
};

Chunking Strategy

#![allow(unused)]
fn main() {
pub struct ChunkingEngine;

impl ChunkingEngine {
    pub async fn chunk_document(
        &self,
        document: Document,
    ) -> anyhow::Result<Vec<Chunk>> {
        let chunks = match document.file_type {
            FileType::Markdown => self.chunk_markdown(&document.content)?,
            FileType::KCL => self.chunk_kcl(&document.content)?,
            FileType::Nushell => self.chunk_nushell(&document.content)?,
            _ => self.chunk_text(&document.content)?,
        };

        Ok(chunks)
    }

    fn chunk_markdown(&self, content: &str) -> anyhow::Result<Vec<Chunk>> {
        let mut chunks = Vec::new();

        // Split by headers
        let sections = content.split(|line: &str| line.starts_with('#'));

        for section in sections {
            // Max 500 tokens per chunk
            if section.len() > 500 {
                // Split further
                for sub_chunk in section.chunks(400) {
                    chunks.push(Chunk {
                        text: sub_chunk.to_string(),
                        metadata: Default::default(),
                    });
                }
            } else {
                chunks.push(Chunk {
                    text: section.to_string(),
                    metadata: Default::default(),
                });
            }
        }

        Ok(chunks)
    }
}
}

Embeddings

#![allow(unused)]
fn main() {
pub enum EmbeddingsProvider {
    OpenAI {
        api_key: String,
        model: "text-embedding-3-small",  // 1536 dims, fast
    },
    Local {
        model_path: String,  // ONNX model
        model: "nomic-embed-text",
    },
}

pub struct EmbeddingsClient {
    provider: EmbeddingsProvider,
}

impl EmbeddingsClient {
    pub async fn embed(&self, text: &str) -> anyhow::Result<Vec<f32>> {
        match &self.provider {
            EmbeddingsProvider::OpenAI { api_key, .. } => {
                // Call OpenAI API
                let response = reqwest::Client::new()
                    .post("https://api.openai.com/v1/embeddings")
                    .bearer_auth(api_key)
                    .json(&serde_json::json!({
                        "model": "text-embedding-3-small",
                        "input": text,
                    }))
                    .send()
                    .await?;

                let result: OpenAIResponse = response.json().await?;
                Ok(result.data[0].embedding.clone())
            },
            EmbeddingsProvider::Local { model_path, .. } => {
                // Use local ONNX model (nomic-embed-text)
                let session = ort::Session::builder()?.commit_from_file(model_path)?;

                let output = session.run(ort::inputs![text]?)?;
                let embedding = output[0].try_extract_tensor()?.view().to_owned();

                Ok(embedding.iter().map(|x| *x as f32).collect())
            },
        }
    }

    pub async fn embed_batch(
        &self,
        texts: Vec<String>,
    ) -> anyhow::Result<Vec<Vec<f32>>> {
        // Batch embed for efficiency
        // (Use batching API for OpenAI, etc.)
    }
}
}

Retrieval

#![allow(unused)]
fn main() {
pub struct HybridRetriever {
    vector_store: SurrealDB,
    bm25_index: BM25Index,
}

impl HybridRetriever {
    pub async fn search(
        &self,
        query: String,
        top_k: u32,
    ) -> anyhow::Result<Vec<ChunkWithScore>> {
        // 1. Semantic search (vector similarity)
        let query_vector = self.embed(&query).await?;
        let semantic_results = self.vector_store.search_hnsw(
            query_vector,
            top_k * 2,  // Get more for re-ranking
            0.5,
        ).await?;

        // 2. BM25 keyword search
        let bm25_results = self.bm25_index.search(&query, top_k * 2)?;

        // 3. Merge and re-rank
        let mut merged = HashMap::new();

        for (i, result) in semantic_results.iter().enumerate() {
            let score = 1.0 / (i as f64 + 1.0);  // Rank-based score
            merged.entry(result.id.clone())
                .and_modify(|s: &mut f64| *s += score * 0.7)  // 70% weight
                .or_insert(score * 0.7);
        }

        for (i, result) in bm25_results.iter().enumerate() {
            let score = 1.0 / (i as f64 + 1.0);
            merged.entry(result.id.clone())
                .and_modify(|s: &mut f64| *s += score * 0.3)  // 30% weight
                .or_insert(score * 0.3);
        }

        // 4. Sort and return top-k
        let mut final_results: Vec<_> = merged.into_iter().collect();
        final_results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());

        Ok(final_results.into_iter()
            .take(top_k as usize)
            .map(|(id, score)| {
                // Fetch full chunk with this score
                ChunkWithScore { id, score }
            })
            .collect())
    }
}
}

πŸ“š Indexing Workflow

Automatic Indexing

File added to docs/
  ↓
Git hook or workflow trigger
  ↓
doc-lifecycle-manager processes
  β”œβ”€ Classifies document
  └─ Publishes "document_added" event
  ↓
RAG system subscribes
  β”œβ”€ Chunks document
  β”œβ”€ Generates embeddings
  β”œβ”€ Stores in SurrealDB
  └─ Updates HNSW index
  ↓
Agent Search Tool ready

Batch Reindexing

# Periodic full reindex (daily or on demand)
vapora rag reindex --all

# Incremental reindex (only changed docs)
vapora rag reindex --since 1d

# Rebuild HNSW index from scratch
vapora rag rebuild-index --optimize

🎯 Implementation Checklist

  • Port RAG system from provisioning (2,140 lines)
  • Integrate with SurrealDB vector store
  • HNSW index setup + optimization
  • Chunking strategies (Markdown, KCL, Nushell)
  • Embeddings client (OpenAI + local fallback)
  • Hybrid retrieval (semantic + BM25)
  • Search tool for agents
  • doc-lifecycle-manager hooks
  • Indexing workflows
  • Batch reindexing
  • CLI: vapora rag search, vapora rag reindex
  • Tests + benchmarks

πŸ“Š Success Metrics

βœ… Search latency < 100ms (p99) βœ… Relevance score > 0.8 for top results βœ… 1000+ documents indexed βœ… HNSW index memory efficient βœ… Agents find relevant context automatically βœ… No hallucinations from out-of-context queries


Version: 0.1.0 Status: βœ… Integration Specification Complete Purpose: RAG system for semantic document search in VAPORA