2026-01-14 04:53:21 +00:00
|
|
|
# Retrieval-Augmented Generation (RAG) System
|
|
|
|
|
|
|
|
|
|
**Status**: ✅ Production-Ready (SurrealDB 1.5.0+, 22/22 tests passing)
|
|
|
|
|
|
|
|
|
|
The RAG system enables the AI service to access, retrieve, and reason over infrastructure documentation, schemas, and past configurations. This allows
|
|
|
|
|
the AI to generate contextually accurate infrastructure configurations and provide intelligent troubleshooting advice grounded in actual platform
|
|
|
|
|
knowledge.
|
|
|
|
|
|
|
|
|
|
## Architecture Overview
|
|
|
|
|
|
|
|
|
|
The RAG system consists of:
|
|
|
|
|
|
|
|
|
|
1. **Document Store**: SurrealDB vector store with semantic indexing
|
|
|
|
|
2. **Hybrid Search**: Vector similarity + BM25 keyword search
|
|
|
|
|
3. **Chunk Management**: Intelligent document chunking for code and markdown
|
|
|
|
|
4. **Context Ranking**: Relevance scoring for retrieved documents
|
|
|
|
|
5. **Semantic Cache**: Deduplication of repeated queries
|
|
|
|
|
|
|
|
|
|
## Core Components
|
|
|
|
|
|
|
|
|
|
### 1. Vector Embeddings
|
|
|
|
|
|
|
|
|
|
The system uses embedding models to convert documents into vector representations:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
┌─────────────────────┐
|
|
|
|
|
│ Document Source │
|
|
|
|
|
│ (Markdown, Code) │
|
|
|
|
|
└──────────┬──────────┘
|
|
|
|
|
│
|
|
|
|
|
▼
|
|
|
|
|
┌──────────────────────────────────┐
|
|
|
|
|
│ Chunking & Tokenization │
|
|
|
|
|
│ - Code-aware splits │
|
|
|
|
|
│ - Markdown aware │
|
|
|
|
|
│ - Preserves context │
|
|
|
|
|
└──────────┬───────────────────────┘
|
|
|
|
|
│
|
|
|
|
|
▼
|
|
|
|
|
┌──────────────────────────────────┐
|
|
|
|
|
│ Embedding Model │
|
|
|
|
|
│ (OpenAI Ada, Anthropic, Local) │
|
|
|
|
|
└──────────┬───────────────────────┘
|
|
|
|
|
│
|
|
|
|
|
▼
|
|
|
|
|
┌──────────────────────────────────┐
|
|
|
|
|
│ Vector Storage (SurrealDB) │
|
|
|
|
|
│ - Vector index │
|
|
|
|
|
│ - Metadata indexed │
|
|
|
|
|
│ - BM25 index for keywords │
|
|
|
|
|
└──────────────────────────────────┘
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 2. SurrealDB Integration
|
|
|
|
|
|
|
|
|
|
SurrealDB serves as the vector database and knowledge store:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
# Configuration in provisioning/schemas/ai.ncl
|
|
|
|
|
let {
|
|
|
|
|
rag = {
|
|
|
|
|
enabled = true,
|
|
|
|
|
db_url = "surreal://localhost:8000",
|
|
|
|
|
namespace = "provisioning",
|
|
|
|
|
database = "ai_rag",
|
|
|
|
|
|
|
|
|
|
# Collections for different document types
|
|
|
|
|
collections = {
|
|
|
|
|
documentation = {
|
|
|
|
|
chunking_strategy = "markdown",
|
|
|
|
|
chunk_size = 1024,
|
|
|
|
|
overlap = 256,
|
|
|
|
|
},
|
|
|
|
|
schemas = {
|
|
|
|
|
chunking_strategy = "code",
|
|
|
|
|
chunk_size = 512,
|
|
|
|
|
overlap = 128,
|
|
|
|
|
},
|
|
|
|
|
deployments = {
|
|
|
|
|
chunking_strategy = "json",
|
|
|
|
|
chunk_size = 2048,
|
|
|
|
|
overlap = 512,
|
|
|
|
|
},
|
|
|
|
|
},
|
|
|
|
|
|
|
|
|
|
# Embedding configuration
|
|
|
|
|
embedding = {
|
|
|
|
|
provider = "openai", # or "anthropic", "local"
|
|
|
|
|
model = "text-embedding-3-small",
|
|
|
|
|
cache_vectors = true,
|
|
|
|
|
},
|
|
|
|
|
|
|
|
|
|
# Search configuration
|
|
|
|
|
search = {
|
|
|
|
|
hybrid_enabled = true,
|
|
|
|
|
vector_weight = 0.7,
|
|
|
|
|
keyword_weight = 0.3,
|
|
|
|
|
top_k = 5, # Number of results to return
|
|
|
|
|
semantic_cache = true,
|
|
|
|
|
},
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 3. Document Chunking
|
|
|
|
|
|
|
|
|
|
Intelligent chunking preserves context while managing token limits:
|
|
|
|
|
|
|
|
|
|
#### Markdown Chunking Strategy
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
Input Document: provisioning/docs/src/guides/from-scratch.md
|
|
|
|
|
|
|
|
|
|
Chunks:
|
|
|
|
|
[1] Header + first section (up to 1024 tokens)
|
|
|
|
|
[2] Next logical section + overlap with [1]
|
|
|
|
|
[3] Code examples preserve as atomic units
|
|
|
|
|
[4] Continue with overlap...
|
|
|
|
|
|
|
|
|
|
Each chunk includes:
|
|
|
|
|
- Original section heading (for context)
|
|
|
|
|
- Content
|
|
|
|
|
- Source file and line numbers
|
|
|
|
|
- Metadata (doctype, category, version)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### Code Chunking Strategy
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
Input Document: provisioning/schemas/main.ncl
|
|
|
|
|
|
|
|
|
|
Chunks:
|
|
|
|
|
[1] Top-level let binding + comments
|
|
|
|
|
[2] Function definition (atomic, preserves signature)
|
|
|
|
|
[3] Type definition (atomic, preserves interface)
|
|
|
|
|
[4] Implementation blocks with context overlap
|
|
|
|
|
|
|
|
|
|
Each chunk preserves:
|
|
|
|
|
- Type signatures
|
|
|
|
|
- Function signatures
|
|
|
|
|
- Import statements needed for context
|
|
|
|
|
- Comments and docstrings
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Hybrid Search
|
|
|
|
|
|
|
|
|
|
The system implements dual search strategy for optimal results:
|
|
|
|
|
|
|
|
|
|
### Vector Similarity Search
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
// Find semantically similar documents
|
|
|
|
|
async fn vector_search(query: &str, top_k: usize) -> Vec<Document> {
|
|
|
|
|
let embedding = embed(query).await?;
|
|
|
|
|
|
|
|
|
|
// L2 distance in SurrealDB
|
|
|
|
|
db.query("
|
|
|
|
|
SELECT *, vector::similarity::cosine(embedding, $embedding) AS score
|
|
|
|
|
FROM documents
|
|
|
|
|
WHERE embedding <~> $embedding
|
|
|
|
|
ORDER BY score DESC
|
|
|
|
|
LIMIT $top_k
|
|
|
|
|
")
|
|
|
|
|
.bind(("embedding", embedding))
|
|
|
|
|
.bind(("top_k", top_k))
|
|
|
|
|
.await
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Use case**: Semantic understanding of intent
|
|
|
|
|
- Query: "How to configure PostgreSQL"
|
|
|
|
|
- Finds: Documents about database configuration, examples, schemas
|
|
|
|
|
|
|
|
|
|
### BM25 Keyword Search
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
// Find documents with matching keywords
|
|
|
|
|
async fn keyword_search(query: &str, top_k: usize) -> Vec<Document> {
|
|
|
|
|
// BM25 full-text search in SurrealDB
|
|
|
|
|
db.query("
|
|
|
|
|
SELECT *, search::bm25(.) AS score
|
|
|
|
|
FROM documents
|
|
|
|
|
WHERE text @@ $query
|
|
|
|
|
ORDER BY score DESC
|
|
|
|
|
LIMIT $top_k
|
|
|
|
|
")
|
|
|
|
|
.bind(("query", query))
|
|
|
|
|
.bind(("top_k", top_k))
|
|
|
|
|
.await
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Use case**: Exact term matching
|
|
|
|
|
- Query: "SurrealDB configuration"
|
|
|
|
|
- Finds: Documents mentioning SurrealDB specifically
|
|
|
|
|
|
|
|
|
|
### Hybrid Results
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
async fn hybrid_search(
|
|
|
|
|
query: &str,
|
|
|
|
|
vector_weight: f32,
|
|
|
|
|
keyword_weight: f32,
|
|
|
|
|
top_k: usize,
|
|
|
|
|
) -> Vec<Document> {
|
|
|
|
|
let vector_results = vector_search(query, top_k * 2).await?;
|
|
|
|
|
let keyword_results = keyword_search(query, top_k * 2).await?;
|
|
|
|
|
|
|
|
|
|
let mut scored = HashMap::new();
|
|
|
|
|
|
|
|
|
|
// Score from vector search
|
|
|
|
|
for (i, doc) in vector_results.iter().enumerate() {
|
|
|
|
|
*scored.entry(doc.id).or_insert(0.0) +=
|
|
|
|
|
vector_weight * (1.0 - (i as f32 / top_k as f32));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Score from keyword search
|
|
|
|
|
for (i, doc) in keyword_results.iter().enumerate() {
|
|
|
|
|
*scored.entry(doc.id).or_insert(0.0) +=
|
|
|
|
|
keyword_weight * (1.0 - (i as f32 / top_k as f32));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Return top-k by combined score
|
|
|
|
|
let mut results: Vec<_> = scored.into_iter().collect();
|
|
|
|
|
| results.sort_by( | a, b | b.1.partial_cmp(&a.1).unwrap()); |
|
|
|
|
|
| Ok(results.into_iter().take(top_k).map( | (id, _) | ...).collect()) |
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Semantic Caching
|
|
|
|
|
|
|
|
|
|
Reduces API calls by caching embeddings of repeated queries:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
struct SemanticCache {
|
|
|
|
|
queries: Arc<DashMap<Vec<f32>, CachedResult>>,
|
|
|
|
|
similarity_threshold: f32,
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
impl SemanticCache {
|
|
|
|
|
async fn get(&self, query: &str) -> Option<CachedResult> {
|
|
|
|
|
let embedding = embed(query).await?;
|
|
|
|
|
|
|
|
|
|
// Find cached query with similar embedding
|
|
|
|
|
// (cosine distance < threshold)
|
|
|
|
|
for entry in self.queries.iter() {
|
|
|
|
|
let distance = cosine_distance(&embedding, entry.key());
|
|
|
|
|
if distance < self.similarity_threshold {
|
|
|
|
|
return Some(entry.value().clone());
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
None
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
async fn insert(&self, query: &str, result: CachedResult) {
|
|
|
|
|
let embedding = embed(query).await?;
|
|
|
|
|
self.queries.insert(embedding, result);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Benefits**:
|
|
|
|
|
- 50-80% reduction in embedding API calls
|
|
|
|
|
- Identical queries return in <10ms
|
|
|
|
|
- Similar queries reuse cached context
|
|
|
|
|
|
|
|
|
|
## Ingestion Workflow
|
|
|
|
|
|
|
|
|
|
### Document Indexing
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
# Index all documentation
|
|
|
|
|
provisioning ai index-docs provisioning/docs/src
|
|
|
|
|
|
|
|
|
|
# Index schemas
|
|
|
|
|
provisioning ai index-schemas provisioning/schemas
|
|
|
|
|
|
|
|
|
|
# Index past deployments
|
|
|
|
|
provisioning ai index-deployments workspaces/*/deployments
|
|
|
|
|
|
|
|
|
|
# Watch directory for changes (development mode)
|
|
|
|
|
provisioning ai watch docs provisioning/docs/src
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Programmatic Indexing
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
// In ai-service on startup
|
|
|
|
|
async fn initialize_rag() -> Result<()> {
|
|
|
|
|
let rag = RAGSystem::new(&config.rag).await?;
|
|
|
|
|
|
|
|
|
|
// Index documentation
|
|
|
|
|
let docs = load_markdown_docs("provisioning/docs/src")?;
|
|
|
|
|
for doc in docs {
|
|
|
|
|
rag.ingest_document(&doc).await?;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Index schemas
|
|
|
|
|
let schemas = load_nickel_schemas("provisioning/schemas")?;
|
|
|
|
|
for schema in schemas {
|
|
|
|
|
rag.ingest_schema(&schema).await?;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
Ok(())
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Usage Examples
|
|
|
|
|
|
|
|
|
|
### Query the RAG System
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
# Search for context-aware information
|
|
|
|
|
provisioning ai query "How do I configure PostgreSQL with encryption?"
|
|
|
|
|
|
|
|
|
|
# Get configuration template
|
|
|
|
|
provisioning ai template "Describe production Kubernetes on AWS"
|
|
|
|
|
|
|
|
|
|
# Interactive mode
|
|
|
|
|
provisioning ai chat
|
|
|
|
|
> What are the best practices for database backup?
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### AI Service Integration
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
// AI service uses RAG to enhance generation
|
|
|
|
|
async fn generate_config(user_request: &str) -> Result<String> {
|
|
|
|
|
// Retrieve relevant context
|
|
|
|
|
let context = rag.search(user_request, top_k=5).await?;
|
|
|
|
|
|
|
|
|
|
// Build prompt with context
|
|
|
|
|
let prompt = build_prompt_with_context(user_request, &context);
|
|
|
|
|
|
|
|
|
|
// Generate configuration
|
|
|
|
|
let config = llm.generate(&prompt).await?;
|
|
|
|
|
|
|
|
|
|
// Validate against schemas
|
|
|
|
|
validate_nickel_config(&config)?;
|
|
|
|
|
|
|
|
|
|
Ok(config)
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Form Assistance Integration
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
// In typdialog-ai (JavaScript/TypeScript)
|
|
|
|
|
async function suggestFieldValue(fieldName, currentInput) {
|
|
|
|
|
// Query RAG for similar configurations
|
|
|
|
|
const context = await rag.search(
|
|
|
|
|
`Field: ${fieldName}, Input: ${currentInput}`,
|
|
|
|
|
{ topK: 3, semantic: true }
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
// Generate suggestion using context
|
|
|
|
|
const suggestion = await ai.suggest({
|
|
|
|
|
field: fieldName,
|
|
|
|
|
input: currentInput,
|
|
|
|
|
context: context,
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
return suggestion;
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Performance Characteristics
|
|
|
|
|
|
|
|
|
|
| | Operation | Time | Cache Hit | |
|
|
|
|
|
| | ----------- | ------ | ----------- | |
|
|
|
|
|
| | Vector embedding | 200-500ms | N/A | |
|
|
|
|
|
| | Vector search (cold) | 300-800ms | N/A | |
|
|
|
|
|
| | Keyword search | 50-200ms | N/A | |
|
|
|
|
|
| | Hybrid search | 500-1200ms | <100ms cached | |
|
|
|
|
|
| | Semantic cache hit | 10-50ms | Always | |
|
|
|
|
|
|
|
|
|
|
**Typical query flow**:
|
|
|
|
|
1. Embedding: 300ms
|
|
|
|
|
2. Vector search: 400ms
|
|
|
|
|
3. Keyword search: 100ms
|
|
|
|
|
4. Ranking: 50ms
|
|
|
|
|
5. **Total**: ~850ms (first call), <100ms (cached)
|
|
|
|
|
|
|
|
|
|
## Configuration
|
|
|
|
|
|
|
|
|
|
See [Configuration Guide](configuration.md) for detailed RAG setup:
|
|
|
|
|
|
|
|
|
|
- LLM provider for embeddings
|
|
|
|
|
- SurrealDB connection
|
|
|
|
|
- Chunking strategies
|
|
|
|
|
- Search weights and limits
|
|
|
|
|
- Cache settings and TTLs
|
|
|
|
|
|
|
|
|
|
## Limitations and Considerations
|
|
|
|
|
|
|
|
|
|
### Document Freshness
|
|
|
|
|
|
|
|
|
|
- RAG indexes static snapshots
|
|
|
|
|
- Changes to documentation require re-indexing
|
|
|
|
|
- Use watch mode during development
|
|
|
|
|
|
|
|
|
|
### Token Limits
|
|
|
|
|
|
|
|
|
|
- Large documents chunked to fit LLM context
|
|
|
|
|
- Some context may be lost in chunking
|
|
|
|
|
- Adjustable chunk size vs. context trade-off
|
|
|
|
|
|
|
|
|
|
### Embedding Quality
|
|
|
|
|
|
|
|
|
|
- Quality depends on embedding model
|
|
|
|
|
- Domain-specific models perform better
|
|
|
|
|
- Fine-tuning possible for specialized vocabularies
|
|
|
|
|
|
|
|
|
|
## Monitoring and Debugging
|
|
|
|
|
|
|
|
|
|
### Query Metrics
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
# View RAG search metrics
|
|
|
|
|
provisioning ai metrics show rag
|
|
|
|
|
|
|
|
|
|
# Analysis of search quality
|
|
|
|
|
provisioning ai eval-rag --sample-queries 100
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Debug Mode
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
# In provisioning/config/ai.toml
|
|
|
|
|
[ai.rag.debug]
|
|
|
|
|
enabled = true
|
|
|
|
|
log_embeddings = true # Log embedding vectors
|
|
|
|
|
log_search_scores = true # Log relevance scores
|
|
|
|
|
log_context_used = true # Log context retrieved
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Related Documentation
|
|
|
|
|
|
|
|
|
|
- [Architecture](architecture.md) - AI system overview
|
|
|
|
|
- [MCP Integration](mcp-integration.md) - RAG access via MCP
|
|
|
|
|
- [Configuration](configuration.md) - RAG setup guide
|
|
|
|
|
- [API Reference](api-reference.md) - RAG API endpoints
|
|
|
|
|
- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
**Last Updated**: 2025-01-13
|
|
|
|
|
**Status**: ✅ Production-Ready
|
|
|
|
|
**Test Coverage**: 22/22 tests passing
|
|
|
|
|
**Database**: SurrealDB 1.5.0+
|