provisioning/docs/src/ai/rag-system.md

# Retrieval-Augmented Generation (RAG) System\n\n**Status**: ✅ Production-Ready (SurrealDB 1.5.0+, 22/22 tests passing)\n\nThe RAG system enables the AI service to access, retrieve, and reason over infrastructure documentation, schemas, and past configurations. This allows\nthe AI to generate contextually accurate infrastructure configurations and provide intelligent troubleshooting advice grounded in actual platform\nknowledge.\n\n## Architecture Overview\n\nThe RAG system consists of:\n\n1. **Document Store**: SurrealDB vector store with semantic indexing\n2. **Hybrid Search**: Vector similarity + BM25 keyword search\n3. **Chunk Management**: Intelligent document chunking for code and markdown\n4. **Context Ranking**: Relevance scoring for retrieved documents\n5. **Semantic Cache**: Deduplication of repeated queries\n\n## Core Components\n\n### 1. Vector Embeddings\n\nThe system uses embedding models to convert documents into vector representations:\n\n```\n┌─────────────────────┐\n│ Document Source     │\n│ (Markdown, Code)    │\n└──────────┬──────────┘\n           │\n           ▼\n┌──────────────────────────────────┐\n│ Chunking & Tokenization          │\n│ - Code-aware splits              │\n│ - Markdown aware                 │\n│ - Preserves context              │\n└──────────┬───────────────────────┘\n           │\n           ▼\n┌──────────────────────────────────┐\n│ Embedding Model                  │\n│ (OpenAI Ada, Anthropic, Local)   │\n└──────────┬───────────────────────┘\n           │\n           ▼\n┌──────────────────────────────────┐\n│ Vector Storage (SurrealDB)       │\n│ - Vector index                   │\n│ - Metadata indexed               │\n│ - BM25 index for keywords        │\n└──────────────────────────────────┘\n```\n\n### 2. SurrealDB Integration\n\nSurrealDB serves as the vector database and knowledge store:\n\n```\n# Configuration in provisioning/schemas/ai.ncl\nlet {\n  rag = {\n    enabled = true,\n    db_url = "surreal://localhost:8000",\n    namespace = "provisioning",\n    database = "ai_rag",\n    \n    # Collections for different document types\n    collections = {\n      documentation = {\n        chunking_strategy = "markdown",\n        chunk_size = 1024,\n        overlap = 256,\n      },\n      schemas = {\n        chunking_strategy = "code",\n        chunk_size = 512,\n        overlap = 128,\n      },\n      deployments = {\n        chunking_strategy = "json",\n        chunk_size = 2048,\n        overlap = 512,\n      },\n    },\n    \n    # Embedding configuration\n    embedding = {\n      provider = "openai",  # or "anthropic", "local"\n      model = "text-embedding-3-small",\n      cache_vectors = true,\n    },\n    \n    # Search configuration\n    search = {\n      hybrid_enabled = true,\n      vector_weight = 0.7,\n      keyword_weight = 0.3,\n      top_k = 5,  # Number of results to return\n      semantic_cache = true,\n    },\n  }\n}\n```\n\n### 3. Document Chunking\n\nIntelligent chunking preserves context while managing token limits:\n\n#### Markdown Chunking Strategy\n\n```\nInput Document: provisioning/docs/src/guides/from-scratch.md\n\nChunks:\n  [1] Header + first section (up to 1024 tokens)\n  [2] Next logical section + overlap with [1]\n  [3] Code examples preserve as atomic units\n  [4] Continue with overlap...\n\nEach chunk includes:\n  - Original section heading (for context)\n  - Content\n  - Source file and line numbers\n  - Metadata (doctype, category, version)\n```\n\n#### Code Chunkin