provisioning/docs/src/ai/rag-system.md

450 lines
12 KiB
Markdown
Raw Normal View History

2026-01-14 04:53:21 +00:00
# Retrieval-Augmented Generation (RAG) System
**Status**: ✅ Production-Ready (SurrealDB 1.5.0+, 22/22 tests passing)
The RAG system enables the AI service to access, retrieve, and reason over infrastructure documentation, schemas, and past configurations. This allows
the AI to generate contextually accurate infrastructure configurations and provide intelligent troubleshooting advice grounded in actual platform
knowledge.
## Architecture Overview
The RAG system consists of:
1. **Document Store**: SurrealDB vector store with semantic indexing
2. **Hybrid Search**: Vector similarity + BM25 keyword search
3. **Chunk Management**: Intelligent document chunking for code and markdown
4. **Context Ranking**: Relevance scoring for retrieved documents
5. **Semantic Cache**: Deduplication of repeated queries
## Core Components
### 1. Vector Embeddings
The system uses embedding models to convert documents into vector representations:
```text
┌─────────────────────┐
│ Document Source │
│ (Markdown, Code) │
└──────────┬──────────┘
┌──────────────────────────────────┐
│ Chunking & Tokenization │
│ - Code-aware splits │
│ - Markdown aware │
│ - Preserves context │
└──────────┬───────────────────────┘
┌──────────────────────────────────┐
│ Embedding Model │
│ (OpenAI Ada, Anthropic, Local) │
└──────────┬───────────────────────┘
┌──────────────────────────────────┐
│ Vector Storage (SurrealDB) │
│ - Vector index │
│ - Metadata indexed │
│ - BM25 index for keywords │
└──────────────────────────────────┘
```
### 2. SurrealDB Integration
SurrealDB serves as the vector database and knowledge store:
```text
# Configuration in provisioning/schemas/ai.ncl
let {
rag = {
enabled = true,
db_url = "surreal://localhost:8000",
namespace = "provisioning",
database = "ai_rag",
# Collections for different document types
collections = {
documentation = {
chunking_strategy = "markdown",
chunk_size = 1024,
overlap = 256,
},
schemas = {
chunking_strategy = "code",
chunk_size = 512,
overlap = 128,
},
deployments = {
chunking_strategy = "json",
chunk_size = 2048,
overlap = 512,
},
},
# Embedding configuration
embedding = {
provider = "openai", # or "anthropic", "local"
model = "text-embedding-3-small",
cache_vectors = true,
},
# Search configuration
search = {
hybrid_enabled = true,
vector_weight = 0.7,
keyword_weight = 0.3,
top_k = 5, # Number of results to return
semantic_cache = true,
},
}
}
```
### 3. Document Chunking
Intelligent chunking preserves context while managing token limits:
#### Markdown Chunking Strategy
```text
Input Document: provisioning/docs/src/guides/from-scratch.md
Chunks:
[1] Header + first section (up to 1024 tokens)
[2] Next logical section + overlap with [1]
[3] Code examples preserve as atomic units
[4] Continue with overlap...
Each chunk includes:
- Original section heading (for context)
- Content
- Source file and line numbers
- Metadata (doctype, category, version)
```
#### Code Chunking Strategy
```text
Input Document: provisioning/schemas/main.ncl
Chunks:
[1] Top-level let binding + comments
[2] Function definition (atomic, preserves signature)
[3] Type definition (atomic, preserves interface)
[4] Implementation blocks with context overlap
Each chunk preserves:
- Type signatures
- Function signatures
- Import statements needed for context
- Comments and docstrings
```
## Hybrid Search
The system implements dual search strategy for optimal results:
### Vector Similarity Search
```text
// Find semantically similar documents
async fn vector_search(query: &str, top_k: usize) -> Vec<Document> {
let embedding = embed(query).await?;
// L2 distance in SurrealDB
db.query("
SELECT *, vector::similarity::cosine(embedding, $embedding) AS score
FROM documents
WHERE embedding <~> $embedding
ORDER BY score DESC
LIMIT $top_k
")
.bind(("embedding", embedding))
.bind(("top_k", top_k))
.await
}
```
**Use case**: Semantic understanding of intent
- Query: "How to configure PostgreSQL"
- Finds: Documents about database configuration, examples, schemas
### BM25 Keyword Search
```text
// Find documents with matching keywords
async fn keyword_search(query: &str, top_k: usize) -> Vec<Document> {
// BM25 full-text search in SurrealDB
db.query("
SELECT *, search::bm25(.) AS score
FROM documents
WHERE text @@ $query
ORDER BY score DESC
LIMIT $top_k
")
.bind(("query", query))
.bind(("top_k", top_k))
.await
}
```
**Use case**: Exact term matching
- Query: "SurrealDB configuration"
- Finds: Documents mentioning SurrealDB specifically
### Hybrid Results
```text
async fn hybrid_search(
query: &str,
vector_weight: f32,
keyword_weight: f32,
top_k: usize,
) -> Vec<Document> {
let vector_results = vector_search(query, top_k * 2).await?;
let keyword_results = keyword_search(query, top_k * 2).await?;
let mut scored = HashMap::new();
// Score from vector search
for (i, doc) in vector_results.iter().enumerate() {
*scored.entry(doc.id).or_insert(0.0) +=
vector_weight * (1.0 - (i as f32 / top_k as f32));
}
// Score from keyword search
for (i, doc) in keyword_results.iter().enumerate() {
*scored.entry(doc.id).or_insert(0.0) +=
keyword_weight * (1.0 - (i as f32 / top_k as f32));
}
// Return top-k by combined score
let mut results: Vec<_> = scored.into_iter().collect();
| results.sort_by( | a, b | b.1.partial_cmp(&a.1).unwrap()); |
| Ok(results.into_iter().take(top_k).map( | (id, _) | ...).collect()) |
}
```
## Semantic Caching
Reduces API calls by caching embeddings of repeated queries:
```text
struct SemanticCache {
queries: Arc<DashMap<Vec<f32>, CachedResult>>,
similarity_threshold: f32,
}
impl SemanticCache {
async fn get(&self, query: &str) -> Option<CachedResult> {
let embedding = embed(query).await?;
// Find cached query with similar embedding
// (cosine distance < threshold)
for entry in self.queries.iter() {
let distance = cosine_distance(&embedding, entry.key());
if distance < self.similarity_threshold {
return Some(entry.value().clone());
}
}
None
}
async fn insert(&self, query: &str, result: CachedResult) {
let embedding = embed(query).await?;
self.queries.insert(embedding, result);
}
}
```
**Benefits**:
- 50-80% reduction in embedding API calls
- Identical queries return in <10ms
- Similar queries reuse cached context
## Ingestion Workflow
### Document Indexing
```text
# Index all documentation
provisioning ai index-docs provisioning/docs/src
# Index schemas
provisioning ai index-schemas provisioning/schemas
# Index past deployments
provisioning ai index-deployments workspaces/*/deployments
# Watch directory for changes (development mode)
provisioning ai watch docs provisioning/docs/src
```
### Programmatic Indexing
```text
// In ai-service on startup
async fn initialize_rag() -> Result<()> {
let rag = RAGSystem::new(&config.rag).await?;
// Index documentation
let docs = load_markdown_docs("provisioning/docs/src")?;
for doc in docs {
rag.ingest_document(&doc).await?;
}
// Index schemas
let schemas = load_nickel_schemas("provisioning/schemas")?;
for schema in schemas {
rag.ingest_schema(&schema).await?;
}
Ok(())
}
```
## Usage Examples
### Query the RAG System
```text
# Search for context-aware information
provisioning ai query "How do I configure PostgreSQL with encryption?"
# Get configuration template
provisioning ai template "Describe production Kubernetes on AWS"
# Interactive mode
provisioning ai chat
> What are the best practices for database backup?
```
### AI Service Integration
```text
// AI service uses RAG to enhance generation
async fn generate_config(user_request: &str) -> Result<String> {
// Retrieve relevant context
let context = rag.search(user_request, top_k=5).await?;
// Build prompt with context
let prompt = build_prompt_with_context(user_request, &context);
// Generate configuration
let config = llm.generate(&prompt).await?;
// Validate against schemas
validate_nickel_config(&config)?;
Ok(config)
}
```
### Form Assistance Integration
```text
// In typdialog-ai (JavaScript/TypeScript)
async function suggestFieldValue(fieldName, currentInput) {
// Query RAG for similar configurations
const context = await rag.search(
`Field: ${fieldName}, Input: ${currentInput}`,
{ topK: 3, semantic: true }
);
// Generate suggestion using context
const suggestion = await ai.suggest({
field: fieldName,
input: currentInput,
context: context,
});
return suggestion;
}
```
## Performance Characteristics
| | Operation | Time | Cache Hit | |
| | ----------- | ------ | ----------- | |
| | Vector embedding | 200-500ms | N/A | |
| | Vector search (cold) | 300-800ms | N/A | |
| | Keyword search | 50-200ms | N/A | |
| | Hybrid search | 500-1200ms | <100ms cached | |
| | Semantic cache hit | 10-50ms | Always | |
**Typical query flow**:
1. Embedding: 300ms
2. Vector search: 400ms
3. Keyword search: 100ms
4. Ranking: 50ms
5. **Total**: ~850ms (first call), <100ms (cached)
## Configuration
See [Configuration Guide](configuration.md) for detailed RAG setup:
- LLM provider for embeddings
- SurrealDB connection
- Chunking strategies
- Search weights and limits
- Cache settings and TTLs
## Limitations and Considerations
### Document Freshness
- RAG indexes static snapshots
- Changes to documentation require re-indexing
- Use watch mode during development
### Token Limits
- Large documents chunked to fit LLM context
- Some context may be lost in chunking
- Adjustable chunk size vs. context trade-off
### Embedding Quality
- Quality depends on embedding model
- Domain-specific models perform better
- Fine-tuning possible for specialized vocabularies
## Monitoring and Debugging
### Query Metrics
```text
# View RAG search metrics
provisioning ai metrics show rag
# Analysis of search quality
provisioning ai eval-rag --sample-queries 100
```
### Debug Mode
```text
# In provisioning/config/ai.toml
[ai.rag.debug]
enabled = true
log_embeddings = true # Log embedding vectors
log_search_scores = true # Log relevance scores
log_context_used = true # Log context retrieved
```
## Related Documentation
- [Architecture](architecture.md) - AI system overview
- [MCP Integration](mcp-integration.md) - RAG access via MCP
- [Configuration](configuration.md) - RAG setup guide
- [API Reference](api-reference.md) - RAG API endpoints
- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions
---
**Last Updated**: 2025-01-13
**Status**: ✅ Production-Ready
**Test Coverage**: 22/22 tests passing
**Database**: SurrealDB 1.5.0+