Vapora/docs/integrations/rag-integration.md
Jesús Pérez d14150da75 feat: Phase 5.3 - Multi-Agent Learning Infrastructure
Implement intelligent agent learning from Knowledge Graph execution history
with per-task-type expertise tracking, recency bias, and learning curves.

## Phase 5.3 Implementation

### Learning Infrastructure ( Complete)
- LearningProfileService with per-task-type expertise metrics
- TaskTypeExpertise model tracking success_rate, confidence, learning curves
- Recency bias weighting: recent 7 days weighted 3x higher (exponential decay)
- Confidence scoring prevents overfitting: min(1.0, executions / 20)
- Learning curves computed from daily execution windows

### Agent Scoring Service ( Complete)
- Unified AgentScore combining SwarmCoordinator + learning profiles
- Scoring formula: 0.3*base + 0.5*expertise + 0.2*confidence
- Rank agents by combined score for intelligent assignment
- Support for recency-biased scoring (recent_success_rate)
- Methods: rank_agents, select_best, rank_agents_with_recency

### KG Integration ( Complete)
- KGPersistence::get_executions_for_task_type() - query by agent + task type
- KGPersistence::get_agent_executions() - all executions for agent
- Coordinator::load_learning_profile_from_kg() - core KG→Learning integration
- Coordinator::load_all_learning_profiles() - batch load for multiple agents
- Convert PersistedExecution → ExecutionData for learning calculations

### Agent Assignment Integration ( Complete)
- AgentCoordinator uses learning profiles for task assignment
- extract_task_type() infers task type from title/description
- assign_task() scores candidates using AgentScoringService
- Fallback to load-based selection if no learning data available
- Learning profiles stored in coordinator.learning_profiles RwLock

### Profile Adapter Enhancements ( Complete)
- create_learning_profile() - initialize empty profiles
- add_task_type_expertise() - set task-type expertise
- update_profile_with_learning() - update swarm profiles from learning

## Files Modified

### vapora-knowledge-graph/src/persistence.rs (+30 lines)
- get_executions_for_task_type(agent_id, task_type, limit)
- get_agent_executions(agent_id, limit)

### vapora-agents/src/coordinator.rs (+100 lines)
- load_learning_profile_from_kg() - core KG integration method
- load_all_learning_profiles() - batch loading for agents
- assign_task() already uses learning-based scoring via AgentScoringService

### Existing Complete Implementation
- vapora-knowledge-graph/src/learning.rs - calculation functions
- vapora-agents/src/learning_profile.rs - data structures and expertise
- vapora-agents/src/scoring.rs - unified scoring service
- vapora-agents/src/profile_adapter.rs - adapter methods

## Tests Passing
- learning_profile: 7 tests 
- scoring: 5 tests 
- profile_adapter: 6 tests 
- coordinator: learning-specific tests 

## Data Flow
1. Task arrives → AgentCoordinator::assign_task()
2. Extract task_type from description
3. Query KG for task-type executions (load_learning_profile_from_kg)
4. Calculate expertise with recency bias
5. Score candidates (SwarmCoordinator + learning)
6. Assign to top-scored agent
7. Execution result → KG → Update learning profiles

## Key Design Decisions
 Recency bias: 7-day half-life with 3x weight for recent performance
 Confidence scoring: min(1.0, total_executions / 20) prevents overfitting
 Hierarchical scoring: 30% base load, 50% expertise, 20% confidence
 KG query limit: 100 recent executions per task-type for performance
 Async loading: load_learning_profile_from_kg supports concurrent loads

## Next: Phase 5.4 - Cost Optimization
Ready to implement budget enforcement and cost-aware provider selection.
2026-01-11 13:03:53 +00:00

514 lines
13 KiB
Markdown

# 🔍 RAG Integration
## Retrievable Augmented Generation for VAPORA Context
**Version**: 0.1.0
**Status**: Specification (VAPORA v1.0 Integration)
**Purpose**: RAG system from provisioning integrated into VAPORA for semantic search
---
## 🎯 Objetivo
**RAG (Retrieval-Augmented Generation)** proporciona contexto a los agentes:
- ✅ Agentes buscan documentación semánticamente similar
- ✅ ADRs, diseños, y guías como contexto para nuevas tareas
- ✅ Query LLM con documentación relevante
- ✅ Reducir alucinaciones, mejorar decisiones
- ✅ Sistema completo de provisioning (2,140 líneas Rust)
---
## 🏗️ RAG Architecture
### Components (From Provisioning)
```
RAG System (2,140 lines, production-ready from provisioning)
├─ Chunking Engine
│ ├─ Markdown chunks (with metadata)
│ ├─ KCL chunks (for infrastructure docs)
│ ├─ Nushell chunks (for scripts)
│ └─ Smart splitting (at headers, code blocks)
├─ Embeddings
│ ├─ Primary: OpenAI API (text-embedding-3-small)
│ ├─ Fallback: Local ONNX (nomic-embed-text)
│ ├─ Dimension: 1536-dim vectors
│ └─ Batch processing
├─ Vector Store
│ ├─ SurrealDB with HNSW index
│ ├─ Fast similarity search
│ ├─ Scalar product distance metric
│ └─ Replication for redundancy
├─ Retrieval
│ ├─ Top-K BM25 + semantic hybrid
│ ├─ Threshold filtering (relevance > 0.7)
│ ├─ Context enrichment
│ └─ Ranking/re-ranking
└─ Integration
├─ Claude API with full context
├─ Agent Search tool
├─ Workflow context injection
└─ Decision-making support
```
### Data Flow
```
Document Added to docs/
doc-lifecycle-manager classifies
RAG Chunking Engine
├─ Split into semantic chunks
└─ Extract metadata (title, type, date)
Embeddings Generator
├─ Generate 1536-dim vector per chunk
└─ Batch process for efficiency
Vector Store (SurrealDB HNSW)
├─ Store chunk + vector + metadata
└─ Create HNSW index
Search Ready
├─ Agent can query
├─ Semantic similarity search
└─ Fast < 100ms latency
```
---
## 🔧 RAG in VAPORA
### Search Tool (Available to All Agents)
```rust
pub struct SearchTool {
pub vector_store: SurrealDB,
pub embeddings: EmbeddingsClient,
pub retriever: HybridRetriever,
}
impl SearchTool {
pub async fn search(
&self,
query: String,
top_k: u32,
threshold: f64,
) -> anyhow::Result<SearchResults> {
// 1. Embed query
let query_vector = self.embeddings.embed(&query).await?;
// 2. Search vector store
let chunk_results = self.vector_store.search_hnsw(
query_vector,
top_k,
threshold,
).await?;
// 3. Enrich with context
let results = self.enrich_results(chunk_results).await?;
Ok(SearchResults {
query,
results,
total_chunks_searched: 1000+,
search_duration_ms: 45,
})
}
pub async fn search_with_filters(
&self,
query: String,
filters: SearchFilters,
) -> anyhow::Result<SearchResults> {
// Filter by document type, date, tags before search
let filtered_documents = self.filter_documents(&filters).await?;
// ... rest of search
}
}
pub struct SearchFilters {
pub doc_type: Option<Vec<String>>, // ["adr", "guide"]
pub date_range: Option<(Date, Date)>,
pub tags: Option<Vec<String>>, // ["orchestrator", "performance"]
pub lifecycle_state: Option<String>, // "published", "archived"
}
pub struct SearchResults {
pub query: String,
pub results: Vec<SearchResult>,
pub total_chunks_searched: u32,
pub search_duration_ms: u32,
}
pub struct SearchResult {
pub document_id: String,
pub document_title: String,
pub chunk_text: String,
pub relevance_score: f64, // 0.0-1.0
pub metadata: HashMap<String, String>,
pub source_url: String,
pub snippet_context: String, // Surrounding text
}
```
### Agent Usage Example
```rust
// Agent decides to search for context
impl DeveloperAgent {
pub async fn implement_feature(
&mut self,
task: Task,
) -> anyhow::Result<()> {
// 1. Search for similar features implemented before
let similar_features = self.search_tool.search(
format!("implement {} feature like {}", task.domain, task.type_),
top_k: 5,
threshold: 0.75,
).await?;
// 2. Extract context from results
let context_docs = similar_features.results
.iter()
.map(|r| r.chunk_text.clone())
.collect::<Vec<_>>();
// 3. Build LLM prompt with context
let prompt = format!(
"Implement the following feature:\n{}\n\nSimilar features implemented:\n{}",
task.description,
context_docs.join("\n---\n")
);
// 4. Generate code with context
let code = self.llm_router.complete(prompt).await?;
Ok(())
}
}
```
### Documenter Agent Integration
```rust
impl DocumenterAgent {
pub async fn update_documentation(
&mut self,
task: Task,
) -> anyhow::Result<()> {
// 1. Get decisions from task
let decisions = task.extract_decisions().await?;
for decision in decisions {
// 2. Search existing ADRs to avoid duplicates
let similar_adrs = self.search_tool.search(
decision.context.clone(),
top_k: 3,
threshold: 0.8,
).await?;
// 3. Check if decision already documented
if similar_adrs.results.is_empty() {
// Create new ADR
let adr_content = format!(
"# {}\n\n## Context\n{}\n\n## Decision\n{}",
decision.title,
decision.context,
decision.chosen_option,
);
// 4. Save and index for RAG
self.db.save_adr(&adr_content).await?;
self.rag_system.index_document(&adr_content).await?;
}
}
Ok(())
}
}
```
---
## 📊 RAG Implementation (From Provisioning)
### Schema (SurrealDB)
```sql
-- RAG chunks table
CREATE TABLE rag_chunks SCHEMAFULL {
-- Identifiers
id: string,
document_id: string,
chunk_index: int,
-- Content
text: string,
title: string,
doc_type: string,
-- Vector
embedding: vector<1536>,
-- Metadata
created_date: datetime,
last_updated: datetime,
source_path: string,
tags: array<string>,
lifecycle_state: string,
-- Indexing
INDEX embedding ON HNSW (1536) FIELDS embedding
DISTANCE SCALAR PRODUCT
M 16
EF_CONSTRUCTION 200,
PERMISSIONS
FOR select ALLOW (true)
FOR create ALLOW (true)
FOR update ALLOW (false)
FOR delete ALLOW (false)
};
```
### Chunking Strategy
```rust
pub struct ChunkingEngine;
impl ChunkingEngine {
pub async fn chunk_document(
&self,
document: Document,
) -> anyhow::Result<Vec<Chunk>> {
let chunks = match document.file_type {
FileType::Markdown => self.chunk_markdown(&document.content)?,
FileType::KCL => self.chunk_kcl(&document.content)?,
FileType::Nushell => self.chunk_nushell(&document.content)?,
_ => self.chunk_text(&document.content)?,
};
Ok(chunks)
}
fn chunk_markdown(&self, content: &str) -> anyhow::Result<Vec<Chunk>> {
let mut chunks = Vec::new();
// Split by headers
let sections = content.split(|line: &str| line.starts_with('#'));
for section in sections {
// Max 500 tokens per chunk
if section.len() > 500 {
// Split further
for sub_chunk in section.chunks(400) {
chunks.push(Chunk {
text: sub_chunk.to_string(),
metadata: Default::default(),
});
}
} else {
chunks.push(Chunk {
text: section.to_string(),
metadata: Default::default(),
});
}
}
Ok(chunks)
}
}
```
### Embeddings
```rust
pub enum EmbeddingsProvider {
OpenAI {
api_key: String,
model: "text-embedding-3-small", // 1536 dims, fast
},
Local {
model_path: String, // ONNX model
model: "nomic-embed-text",
},
}
pub struct EmbeddingsClient {
provider: EmbeddingsProvider,
}
impl EmbeddingsClient {
pub async fn embed(&self, text: &str) -> anyhow::Result<Vec<f32>> {
match &self.provider {
EmbeddingsProvider::OpenAI { api_key, .. } => {
// Call OpenAI API
let response = reqwest::Client::new()
.post("https://api.openai.com/v1/embeddings")
.bearer_auth(api_key)
.json(&serde_json::json!({
"model": "text-embedding-3-small",
"input": text,
}))
.send()
.await?;
let result: OpenAIResponse = response.json().await?;
Ok(result.data[0].embedding.clone())
},
EmbeddingsProvider::Local { model_path, .. } => {
// Use local ONNX model (nomic-embed-text)
let session = ort::Session::builder()?.commit_from_file(model_path)?;
let output = session.run(ort::inputs![text]?)?;
let embedding = output[0].try_extract_tensor()?.view().to_owned();
Ok(embedding.iter().map(|x| *x as f32).collect())
},
}
}
pub async fn embed_batch(
&self,
texts: Vec<String>,
) -> anyhow::Result<Vec<Vec<f32>>> {
// Batch embed for efficiency
// (Use batching API for OpenAI, etc.)
}
}
```
### Retrieval
```rust
pub struct HybridRetriever {
vector_store: SurrealDB,
bm25_index: BM25Index,
}
impl HybridRetriever {
pub async fn search(
&self,
query: String,
top_k: u32,
) -> anyhow::Result<Vec<ChunkWithScore>> {
// 1. Semantic search (vector similarity)
let query_vector = self.embed(&query).await?;
let semantic_results = self.vector_store.search_hnsw(
query_vector,
top_k * 2, // Get more for re-ranking
0.5,
).await?;
// 2. BM25 keyword search
let bm25_results = self.bm25_index.search(&query, top_k * 2)?;
// 3. Merge and re-rank
let mut merged = HashMap::new();
for (i, result) in semantic_results.iter().enumerate() {
let score = 1.0 / (i as f64 + 1.0); // Rank-based score
merged.entry(result.id.clone())
.and_modify(|s: &mut f64| *s += score * 0.7) // 70% weight
.or_insert(score * 0.7);
}
for (i, result) in bm25_results.iter().enumerate() {
let score = 1.0 / (i as f64 + 1.0);
merged.entry(result.id.clone())
.and_modify(|s: &mut f64| *s += score * 0.3) // 30% weight
.or_insert(score * 0.3);
}
// 4. Sort and return top-k
let mut final_results: Vec<_> = merged.into_iter().collect();
final_results.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
Ok(final_results.into_iter()
.take(top_k as usize)
.map(|(id, score)| {
// Fetch full chunk with this score
ChunkWithScore { id, score }
})
.collect())
}
}
```
---
## 📚 Indexing Workflow
### Automatic Indexing
```
File added to docs/
Git hook or workflow trigger
doc-lifecycle-manager processes
├─ Classifies document
└─ Publishes "document_added" event
RAG system subscribes
├─ Chunks document
├─ Generates embeddings
├─ Stores in SurrealDB
└─ Updates HNSW index
Agent Search Tool ready
```
### Batch Reindexing
```bash
# Periodic full reindex (daily or on demand)
vapora rag reindex --all
# Incremental reindex (only changed docs)
vapora rag reindex --since 1d
# Rebuild HNSW index from scratch
vapora rag rebuild-index --optimize
```
---
## 🎯 Implementation Checklist
- [ ] Port RAG system from provisioning (2,140 lines)
- [ ] Integrate with SurrealDB vector store
- [ ] HNSW index setup + optimization
- [ ] Chunking strategies (Markdown, KCL, Nushell)
- [ ] Embeddings client (OpenAI + local fallback)
- [ ] Hybrid retrieval (semantic + BM25)
- [ ] Search tool for agents
- [ ] doc-lifecycle-manager hooks
- [ ] Indexing workflows
- [ ] Batch reindexing
- [ ] CLI: `vapora rag search`, `vapora rag reindex`
- [ ] Tests + benchmarks
---
## 📊 Success Metrics
✅ Search latency < 100ms (p99)
Relevance score > 0.8 for top results
✅ 1000+ documents indexed
✅ HNSW index memory efficient
✅ Agents find relevant context automatically
✅ No hallucinations from out-of-context queries
---
**Version**: 0.1.0
**Status**: ✅ Integration Specification Complete
**Purpose**: RAG system for semantic document search in VAPORA