Vapora/docs/integrations/rag-integration.md
Jesús Pérez d14150da75 feat: Phase 5.3 - Multi-Agent Learning Infrastructure
Implement intelligent agent learning from Knowledge Graph execution history
with per-task-type expertise tracking, recency bias, and learning curves.

## Phase 5.3 Implementation

### Learning Infrastructure ( Complete)
- LearningProfileService with per-task-type expertise metrics
- TaskTypeExpertise model tracking success_rate, confidence, learning curves
- Recency bias weighting: recent 7 days weighted 3x higher (exponential decay)
- Confidence scoring prevents overfitting: min(1.0, executions / 20)
- Learning curves computed from daily execution windows

### Agent Scoring Service ( Complete)
- Unified AgentScore combining SwarmCoordinator + learning profiles
- Scoring formula: 0.3*base + 0.5*expertise + 0.2*confidence
- Rank agents by combined score for intelligent assignment
- Support for recency-biased scoring (recent_success_rate)
- Methods: rank_agents, select_best, rank_agents_with_recency

### KG Integration ( Complete)
- KGPersistence::get_executions_for_task_type() - query by agent + task type
- KGPersistence::get_agent_executions() - all executions for agent
- Coordinator::load_learning_profile_from_kg() - core KG→Learning integration
- Coordinator::load_all_learning_profiles() - batch load for multiple agents
- Convert PersistedExecution → ExecutionData for learning calculations

### Agent Assignment Integration ( Complete)
- AgentCoordinator uses learning profiles for task assignment
- extract_task_type() infers task type from title/description
- assign_task() scores candidates using AgentScoringService
- Fallback to load-based selection if no learning data available
- Learning profiles stored in coordinator.learning_profiles RwLock

### Profile Adapter Enhancements ( Complete)
- create_learning_profile() - initialize empty profiles
- add_task_type_expertise() - set task-type expertise
- update_profile_with_learning() - update swarm profiles from learning

## Files Modified

### vapora-knowledge-graph/src/persistence.rs (+30 lines)
- get_executions_for_task_type(agent_id, task_type, limit)
- get_agent_executions(agent_id, limit)

### vapora-agents/src/coordinator.rs (+100 lines)
- load_learning_profile_from_kg() - core KG integration method
- load_all_learning_profiles() - batch loading for agents
- assign_task() already uses learning-based scoring via AgentScoringService

### Existing Complete Implementation
- vapora-knowledge-graph/src/learning.rs - calculation functions
- vapora-agents/src/learning_profile.rs - data structures and expertise
- vapora-agents/src/scoring.rs - unified scoring service
- vapora-agents/src/profile_adapter.rs - adapter methods

## Tests Passing
- learning_profile: 7 tests 
- scoring: 5 tests 
- profile_adapter: 6 tests 
- coordinator: learning-specific tests 

## Data Flow
1. Task arrives → AgentCoordinator::assign_task()
2. Extract task_type from description
3. Query KG for task-type executions (load_learning_profile_from_kg)
4. Calculate expertise with recency bias
5. Score candidates (SwarmCoordinator + learning)
6. Assign to top-scored agent
7. Execution result → KG → Update learning profiles

## Key Design Decisions
 Recency bias: 7-day half-life with 3x weight for recent performance
 Confidence scoring: min(1.0, total_executions / 20) prevents overfitting
 Hierarchical scoring: 30% base load, 50% expertise, 20% confidence
 KG query limit: 100 recent executions per task-type for performance
 Async loading: load_learning_profile_from_kg supports concurrent loads

## Next: Phase 5.4 - Cost Optimization
Ready to implement budget enforcement and cost-aware provider selection.
2026-01-11 13:03:53 +00:00

13 KiB

🔍 RAG Integration

Retrievable Augmented Generation for VAPORA Context

Version: 0.1.0 Status: Specification (VAPORA v1.0 Integration) Purpose: RAG system from provisioning integrated into VAPORA for semantic search


🎯 Objetivo

RAG (Retrieval-Augmented Generation) proporciona contexto a los agentes:

  • Agentes buscan documentación semánticamente similar
  • ADRs, diseños, y guías como contexto para nuevas tareas
  • Query LLM con documentación relevante
  • Reducir alucinaciones, mejorar decisiones
  • Sistema completo de provisioning (2,140 líneas Rust)

🏗️ RAG Architecture

Components (From Provisioning)

RAG System (2,140 lines, production-ready from provisioning)
├─ Chunking Engine
│  ├─ Markdown chunks (with metadata)
│  ├─ KCL chunks (for infrastructure docs)
│  ├─ Nushell chunks (for scripts)
│  └─ Smart splitting (at headers, code blocks)
│
├─ Embeddings
│  ├─ Primary: OpenAI API (text-embedding-3-small)
│  ├─ Fallback: Local ONNX (nomic-embed-text)
│  ├─ Dimension: 1536-dim vectors
│  └─ Batch processing
│
├─ Vector Store
│  ├─ SurrealDB with HNSW index
│  ├─ Fast similarity search
│  ├─ Scalar product distance metric
│  └─ Replication for redundancy
│
├─ Retrieval
│  ├─ Top-K BM25 + semantic hybrid
│  ├─ Threshold filtering (relevance > 0.7)
│  ├─ Context enrichment
│  └─ Ranking/re-ranking
│
└─ Integration
   ├─ Claude API with full context
   ├─ Agent Search tool
   ├─ Workflow context injection
   └─ Decision-making support

Data Flow

Document Added to docs/
  ↓
doc-lifecycle-manager classifies
  ↓
RAG Chunking Engine
  ├─ Split into semantic chunks
  └─ Extract metadata (title, type, date)
  ↓
Embeddings Generator
  ├─ Generate 1536-dim vector per chunk
  └─ Batch process for efficiency
  ↓
Vector Store (SurrealDB HNSW)
  ├─ Store chunk + vector + metadata
  └─ Create HNSW index
  ↓
Search Ready
  ├─ Agent can query
  ├─ Semantic similarity search
  └─ Fast < 100ms latency

🔧 RAG in VAPORA

Search Tool (Available to All Agents)

pub struct SearchTool {
    pub vector_store: SurrealDB,
    pub embeddings: EmbeddingsClient,
    pub retriever: HybridRetriever,
}

impl SearchTool {
    pub async fn search(
        &self,
        query: String,
        top_k: u32,
        threshold: f64,
    ) -> anyhow::Result<SearchResults> {
        // 1. Embed query
        let query_vector = self.embeddings.embed(&query).await?;

        // 2. Search vector store
        let chunk_results = self.vector_store.search_hnsw(
            query_vector,
            top_k,
            threshold,
        ).await?;

        // 3. Enrich with context
        let results = self.enrich_results(chunk_results).await?;

        Ok(SearchResults {
            query,
            results,
            total_chunks_searched: 1000+,
            search_duration_ms: 45,
        })
    }

    pub async fn search_with_filters(
        &self,
        query: String,
        filters: SearchFilters,
    ) -> anyhow::Result<SearchResults> {
        // Filter by document type, date, tags before search
        let filtered_documents = self.filter_documents(&filters).await?;
        // ... rest of search
    }
}

pub struct SearchFilters {
    pub doc_type: Option<Vec<String>>,      // ["adr", "guide"]
    pub date_range: Option<(Date, Date)>,
    pub tags: Option<Vec<String>>,          // ["orchestrator", "performance"]
    pub lifecycle_state: Option<String>,    // "published", "archived"
}

pub struct SearchResults {
    pub query: String,
    pub results: Vec<SearchResult>,
    pub total_chunks_searched: u32,
    pub search_duration_ms: u32,
}

pub struct SearchResult {
    pub document_id: String,
    pub document_title: String,
    pub chunk_text: String,
    pub relevance_score: f64,      // 0.0-1.0
    pub metadata: HashMap<String, String>,
    pub source_url: String,
    pub snippet_context: String,   // Surrounding text
}

Agent Usage Example

// Agent decides to search for context
impl DeveloperAgent {
    pub async fn implement_feature(
        &mut self,
        task: Task,
    ) -> anyhow::Result<()> {
        // 1. Search for similar features implemented before
        let similar_features = self.search_tool.search(
            format!("implement {} feature like {}", task.domain, task.type_),
            top_k: 5,
            threshold: 0.75,
        ).await?;

        // 2. Extract context from results
        let context_docs = similar_features.results
            .iter()
            .map(|r| r.chunk_text.clone())
            .collect::<Vec<_>>();

        // 3. Build LLM prompt with context
        let prompt = format!(
            "Implement the following feature:\n{}\n\nSimilar features implemented:\n{}",
            task.description,
            context_docs.join("\n---\n")
        );

        // 4. Generate code with context
        let code = self.llm_router.complete(prompt).await?;

        Ok(())
    }
}

Documenter Agent Integration

impl DocumenterAgent {
    pub async fn update_documentation(
        &mut self,
        task: Task,
    ) -> anyhow::Result<()> {
        // 1. Get decisions from task
        let decisions = task.extract_decisions().await?;

        for decision in decisions {
            // 2. Search existing ADRs to avoid duplicates
            let similar_adrs = self.search_tool.search(
                decision.context.clone(),
                top_k: 3,
                threshold: 0.8,
            ).await?;

            // 3. Check if decision already documented
            if similar_adrs.results.is_empty() {
                // Create new ADR
                let adr_content = format!(
                    "# {}\n\n## Context\n{}\n\n## Decision\n{}",
                    decision.title,
                    decision.context,
                    decision.chosen_option,
                );

                // 4. Save and index for RAG
                self.db.save_adr(&adr_content).await?;
                self.rag_system.index_document(&adr_content).await?;
            }
        }

        Ok(())
    }
}

📊 RAG Implementation (From Provisioning)

Schema (SurrealDB)

-- RAG chunks table
CREATE TABLE rag_chunks SCHEMAFULL {
    -- Identifiers
    id: string,
    document_id: string,
    chunk_index: int,

    -- Content
    text: string,
    title: string,
    doc_type: string,

    -- Vector
    embedding: vector<1536>,

    -- Metadata
    created_date: datetime,
    last_updated: datetime,
    source_path: string,
    tags: array<string>,
    lifecycle_state: string,

    -- Indexing
    INDEX embedding ON HNSW (1536) FIELDS embedding
        DISTANCE SCALAR PRODUCT
        M 16
        EF_CONSTRUCTION 200,

    PERMISSIONS
        FOR select ALLOW (true)
        FOR create ALLOW (true)
        FOR update ALLOW (false)
        FOR delete ALLOW (false)
};

Chunking Strategy

pub struct ChunkingEngine;

impl ChunkingEngine {
    pub async fn chunk_document(
        &self,
        document: Document,
    ) -> anyhow::Result<Vec<Chunk>> {
        let chunks = match document.file_type {
            FileType::Markdown => self.chunk_markdown(&document.content)?,
            FileType::KCL => self.chunk_kcl(&document.content)?,
            FileType::Nushell => self.chunk_nushell(&document.content)?,
            _ => self.chunk_text(&document.content)?,
        };

        Ok(chunks)
    }

    fn chunk_markdown(&self, content: &str) -> anyhow::Result<Vec<Chunk>> {
        let mut chunks = Vec::new();

        // Split by headers
        let sections = content.split(|line: &str| line.starts_with('#'));

        for section in sections {
            // Max 500 tokens per chunk
            if section.len() > 500 {
                // Split further
                for sub_chunk in section.chunks(400) {
                    chunks.push(Chunk {
                        text: sub_chunk.to_string(),
                        metadata: Default::default(),
                    });
                }
            } else {
                chunks.push(Chunk {
                    text: section.to_string(),
                    metadata: Default::default(),
                });
            }
        }

        Ok(chunks)
    }
}

Embeddings

pub enum EmbeddingsProvider {
    OpenAI {
        api_key: String,
        model: "text-embedding-3-small",  // 1536 dims, fast
    },
    Local {
        model_path: String,  // ONNX model
        model: "nomic-embed-text",
    },
}

pub struct EmbeddingsClient {
    provider: EmbeddingsProvider,
}

impl EmbeddingsClient {
    pub async fn embed(&self, text: &str) -> anyhow::Result<Vec<f32>> {
        match &self.provider {
            EmbeddingsProvider::OpenAI { api_key, .. } => {
                // Call OpenAI API
                let response = reqwest::Client::new()
                    .post("https://api.openai.com/v1/embeddings")
                    .bearer_auth(api_key)
                    .json(&serde_json::json!({
                        "model": "text-embedding-3-small",
                        "input": text,
                    }))
                    .send()
                    .await?;

                let result: OpenAIResponse = response.json().await?;
                Ok(result.data[0].embedding.clone())
            },
            EmbeddingsProvider::Local { model_path, .. } => {
                // Use local ONNX model (nomic-embed-text)
                let session = ort::Session::builder()?.commit_from_file(model_path)?;

                let output = session.run(ort::inputs![text]?)?;
                let embedding = output[0].try_extract_tensor()?.view().to_owned();

                Ok(embedding.iter().map(|x| *x as f32).collect())
            },
        }
    }

    pub async fn embed_batch(
        &self,
        texts: Vec<String>,
    ) -> anyhow::Result<Vec<Vec<f32>>> {
        // Batch embed for efficiency
        // (Use batching API for OpenAI, etc.)
    }
}

Retrieval

pub struct HybridRetriever {
    vector_store: SurrealDB,
    bm25_index: BM25Index,
}

impl HybridRetriever {
    pub async fn search(
        &self,
        query: String,
        top_k: u32,
    ) -> anyhow::Result<Vec<ChunkWithScore>> {
        // 1. Semantic search (vector similarity)
        let query_vector = self.embed(&query).await?;
        let semantic_results = self.vector_store.search_hnsw(
            query_vector,
            top_k * 2,  // Get more for re-ranking
            0.5,
        ).await?;

        // 2. BM25 keyword search
        let bm25_results = self.bm25_index.search(&query, top_k * 2)?;

        // 3. Merge and re-rank
        let mut merged = HashMap::new();

        for (i, result) in semantic_results.iter().enumerate() {
            let score = 1.0 / (i as f64 + 1.0);  // Rank-based score
            merged.entry(result.id.clone())
                .and_modify(|s: &mut f64| *s += score * 0.7)  // 70% weight
                .or_insert(score * 0.7);
        }

        for (i, result) in bm25_results.iter().enumerate() {
            let score = 1.0 / (i as f64 + 1.0);
            merged.entry(result.id.clone())
                .and_modify(|s: &mut f64| *s += score * 0.3)  // 30% weight
                .or_insert(score * 0.3);
        }

        // 4. Sort and return top-k
        let mut final_results: Vec<_> = merged.into_iter().collect();
        final_results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());

        Ok(final_results.into_iter()
            .take(top_k as usize)
            .map(|(id, score)| {
                // Fetch full chunk with this score
                ChunkWithScore { id, score }
            })
            .collect())
    }
}

📚 Indexing Workflow

Automatic Indexing

File added to docs/
  ↓
Git hook or workflow trigger
  ↓
doc-lifecycle-manager processes
  ├─ Classifies document
  └─ Publishes "document_added" event
  ↓
RAG system subscribes
  ├─ Chunks document
  ├─ Generates embeddings
  ├─ Stores in SurrealDB
  └─ Updates HNSW index
  ↓
Agent Search Tool ready

Batch Reindexing

# Periodic full reindex (daily or on demand)
vapora rag reindex --all

# Incremental reindex (only changed docs)
vapora rag reindex --since 1d

# Rebuild HNSW index from scratch
vapora rag rebuild-index --optimize

🎯 Implementation Checklist

  • Port RAG system from provisioning (2,140 lines)
  • Integrate with SurrealDB vector store
  • HNSW index setup + optimization
  • Chunking strategies (Markdown, KCL, Nushell)
  • Embeddings client (OpenAI + local fallback)
  • Hybrid retrieval (semantic + BM25)
  • Search tool for agents
  • doc-lifecycle-manager hooks
  • Indexing workflows
  • Batch reindexing
  • CLI: vapora rag search, vapora rag reindex
  • Tests + benchmarks

📊 Success Metrics

Search latency < 100ms (p99) Relevance score > 0.8 for top results 1000+ documents indexed HNSW index memory efficient Agents find relevant context automatically No hallucinations from out-of-context queries


Version: 0.1.0 Status: Integration Specification Complete Purpose: RAG system for semantic document search in VAPORA