# ADR-015: AI Integration Architecture for Intelligent Infrastructure Provisioning\n\n## Status\n\n**Accepted** - 2025-01-08\n\n## Context\n\nThe provisioning platform has evolved to include complex workflows for infrastructure configuration, deployment, and management.\nCurrent interaction patterns require deep technical knowledge of Nickel schemas, cloud provider APIs, networking concepts, and security best practices.\nThis creates barriers to entry and slows down infrastructure provisioning for operators who are not infrastructure experts.\n\n### The Infrastructure Complexity Problem\n\n**Current state challenges**:\n\n1. **Knowledge Barrier**: Deep Nickel, cloud, and networking expertise required\n - Understanding Nickel type system and contracts\n - Knowing cloud provider resource relationships\n - Configuring security policies correctly\n - Debugging deployment failures\n\n2. **Manual Configuration**: All configs hand-written\n - Repetitive boilerplate for common patterns\n - Easy to make mistakes (typos, missing fields)\n - No intelligent suggestions or autocomplete\n - Trial-and-error debugging\n\n3. **Limited Assistance**: No contextual help\n - Documentation is separate from workflow\n - No explanation of validation errors\n - No suggestions for fixing issues\n - No learning from past deployments\n\n4. **Troubleshooting Difficulty**: Manual log analysis\n - Deployment failures require expert analysis\n - No automated root cause detection\n - No suggested fixes based on similar issues\n - Long time-to-resolution\n\n### AI Integration Opportunities\n\n1. **Natural Language to Configuration**:\n - User: "Create a production PostgreSQL cluster with encryption and daily backups"\n - AI: Generates validated Nickel configuration\n\n2. **AI-Assisted Form Filling**:\n - User starts typing in typdialog web form\n - AI suggests values based on context\n - AI explains validation errors in plain language\n\n3. **Intelligent Troubleshooting**:\n - Deployment fails\n - AI analyzes logs and suggests fixes\n - AI generates corrected configuration\n\n4. **Configuration Optimization**:\n - AI analyzes workload patterns\n - AI suggests performance improvements\n - AI detects security misconfigurations\n\n5. **Learning from Operations**:\n - AI indexes past deployments\n - AI suggests configurations based on similar workloads\n - AI predicts potential issues\n\n### AI Components Overview\n\nThe system integrates multiple AI components:\n\n1. **typdialog-ai**: AI-assisted form interactions\n2. **typdialog-ag**: AI agents for autonomous operations\n3. **typdialog-prov-gen**: AI-powered configuration generation\n4. **platform/crates/ai-service**: Core AI service backend\n5. **platform/crates/mcp-server**: Model Context Protocol server\n6. **platform/crates/rag**: Retrieval-Augmented Generation system\n\n### Requirements for AI Integration\n\n- ✅ **Natural Language Understanding**: Parse user intent from free-form text\n- ✅ **Schema-Aware Generation**: Generate valid Nickel configurations\n- ✅ **Context Retrieval**: Access documentation, schemas, past deployments\n- ✅ **Security Enforcement**: Cedar policies control AI access\n- ✅ **Human-in-the-Loop**: All AI actions require human approval\n- ✅ **Audit Trail**: Complete logging of AI operations\n- ✅ **Multi-Provider Support**: OpenAI, Anthropic, local models\n- ✅ **Cost Control**: Rate limiting and budget management\n- ✅ **Observability**: Trace AI decisions and reasoning\n\n## Decision\n\nIntegrate a **comprehensive AI system** consisting of:\n\n1. **AI-Assisted Interfaces** (typdialog-ai)\n2. **Autonomous AI Agents** (typdialog-ag)\n3. **AI Configuration Generator** (typdialog-prov-gen)\n4. **Core AI Infrastructure** (ai-service, mcp-server, rag)\n\nAll AI components are **schema-aware**, **security-enforced**, and **human-supervised**.\n\n### Architecture Diagram\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│ User Interfaces │\n│ │\n│ Natural Language: "Create production K8s cluster in AWS" │\n│ Typdialog Forms: AI-assisted field suggestions │\n│ CLI: provisioning ai generate-config "description" │\n└────────────┬────────────────────────────────────────────────────┘\n │\n ▼\n┌─────────────────────────────────────────────────────────────────┐\n│ AI Frontend Layer │\n│ ┌───────────────────────────────────────────────────────┐ │\n│ │ typdialog-ai (AI-Assisted Forms) │ │\n│ │ - Natural language form filling │ │\n│ │ - Real-time AI suggestions │ │\n│ │ - Validation error explanations │ │\n│ │ - Context-aware autocomplete │ │\n│ ├───────────────────────────────────────────────────────┤ │\n│ │ typdialog-ag (AI Agents) │ │\n│ │ - Autonomous task execution │ │\n│ │ - Multi-step workflow automation │ │\n│ │ - Learning from feedback │ │\n│ │ - Agent collaboration │ │\n│ ├───────────────────────────────────────────────────────┤ │\n│ │ typdialog-prov-gen (Config Generator) │ │\n│ │ - Natural language → Nickel config │ │\n│ │ - Template-based generation │ │\n│ │ - Best practice injection │ │\n│ │ - Validation and refinement │ │\n│ └───────────────────────────────────────────────────────┘ │\n└────────────┬────────────────────────────────────────────────────┘\n │\n ▼\n┌────────────────────────────────────────────────────────────────┐\n│ Core AI Infrastructure (platform/crates/) │\n│ ┌───────────────────────────────────────────────────────┐ │\n│ │ ai-service (Central AI Service) │ │\n│ │ │ │\n│ │ - Request routing and orchestration │ │\n│ │ - Authentication and authorization (Cedar) │ │\n│ │ - Rate limiting and cost control │ │\n│ │ - Caching and optimization │ │\n│ │ - Audit logging and observability │ │\n│ │ - Multi-provider abstraction │ │\n│ └─────────────┬─────────────────────┬───────────────────┘ │\n│ │ │ │\n│ ▼ ▼ │\n│ ┌─────────────────────┐ ┌─────────────────────┐ │\n│ │ mcp-server │ │ rag │ │\n│ │ (Model Context │ │ (Retrieval-Aug Gen) │ │\n│ │ Protocol) │ │ │ │\n│ │ │ │ ┌─────────────────┐ │ │\n│ │ - LLM integration │ │ │ Vector Store │ │ │\n│ │ - Tool calling │ │ │ (Qdrant/Milvus) │ │ │\n│ │ - Context mgmt │ │ └─────────────────┘ │ │\n│ │ - Multi-provider │ │ ┌─────────────────┐ │ │\n│ │ (OpenAI, │ │ │ Embeddings │ │ │\n│ │ Anthropic, │ │ │ (text-embed) │ │ │\n│ │ Local models) │ │ └─────────────────┘ │ │\n│ │ │ │ ┌─────────────────┐ │ │\n│ │ Tools: │ │ │ Index: │ │ │\n│ │ - nickel_validate │ │ │ - Nickel schemas│ │ │\n│ │ - schema_query │ │ │ - Documentation │ │ │\n│ │ - config_generate │ │ │ - Past deploys │ │ │\n│ │ - cedar_check │ │ │ - Best practices│ │ │\n│ └─────────────────────┘ │ └─────────────────┘ │ │\n│ │ │ │\n│ │ Query: "How to │ │\n│ │ configure Postgres │ │\n│ │ with encryption?" │ │\n│ │ │ │\n│ │ Retrieval: Relevant │ │\n│ │ docs + examples │ │\n│ └─────────────────────┘ │\n└────────────┬───────────────────────────────────────────────────┘\n │\n ▼\n┌─────────────────────────────────────────────────────────────────┐\n│ Integration Points │\n│ │\n│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │\n│ │ Nickel │ │ SecretumVault│ │ Cedar Authorization │ │\n│ │ Validation │ │ (Secrets) │ │ (AI Policies) │ │\n│ └─────────────┘ └──────────────┘ └─────────────────────┘ │\n│ │\n│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │\n│ │ Orchestrator│ │ Typdialog │ │ Audit Logging │ │\n│ │ (Deploy) │ │ (Forms) │ │ (All AI Ops) │ │\n│ └─────────────┘ └──────────────┘ └─────────────────────┘ │\n└─────────────────────────────────────────────────────────────────┘\n │\n ▼\n┌─────────────────────────────────────────────────────────────────┐\n│ Output: Validated Nickel Configuration │\n│ │\n│ ✅ Schema-validated │\n│ ✅ Security-checked (Cedar policies) │\n│ ✅ Human-approved │\n│ ✅ Audit-logged │\n│ ✅ Ready for deployment │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n### Component Responsibilities\n\n**typdialog-ai** (AI-Assisted Forms):\n- Real-time form field suggestions based on context\n- Natural language form filling\n- Validation error explanations in plain English\n- Context-aware autocomplete for configuration values\n- Integration with typdialog web UI\n\n**typdialog-ag** (AI Agents):\n- Autonomous task execution (multi-step workflows)\n- Agent collaboration (multiple agents working together)\n- Learning from user feedback and past operations\n- Goal-oriented behavior (achieve outcome, not just execute steps)\n- Safety boundaries (cannot deploy without approval)\n\n**typdialog-prov-gen** (Config Generator):\n- Natural language → Nickel configuration\n- Template-based generation with customization\n- Best practice injection (security, performance, HA)\n- Iterative refinement based on validation feedback\n- Integration with Nickel schema system\n\n**ai-service** (Core AI Service):\n- Central request router for all AI operations\n- Authentication and authorization (Cedar policies)\n- Rate limiting and cost control\n- Caching (reduce LLM API calls)\n- Audit logging (all AI operations)\n- Multi-provider abstraction (OpenAI, Anthropic, local)\n\n**mcp-server** (Model Context Protocol):\n- LLM integration (OpenAI, Anthropic, local models)\n- Tool calling framework (nickel_validate, schema_query, etc.)\n- Context management (conversation history, schemas)\n- Streaming responses for real-time feedback\n- Error handling and retries\n\n**rag** (Retrieval-Augmented Generation):\n- Vector store (Qdrant/Milvus) for embeddings\n- Document indexing (Nickel schemas, docs, deployments)\n- Semantic search (find relevant context)\n- Embedding generation (text-embedding-3-large)\n- Query expansion and reranking\n\n## Rationale\n\n### Why AI Integration Is Essential\n\n| Aspect | Manual Config | AI-Assisted (chosen) |\n| -------- | --------------- | ---------------------- |\n| **Learning Curve** | 🔴 Steep | 🟢 Gentle |\n| **Time to Deploy** | 🔴 Hours | 🟢 Minutes |\n| **Error Rate** | 🔴 High | 🟢 Low (validated) |\n| **Documentation Access** | 🔴 Separate | 🟢 Contextual |\n| **Troubleshooting** | 🔴 Manual | 🟢 AI-assisted |\n| **Best Practices** | ⚠️ Manual enforcement | ✅ Auto-injected |\n| **Consistency** | ⚠️ Varies by operator | ✅ Standardized |\n| **Scalability** | 🔴 Limited by expertise | 🟢 AI scales knowledge |\n\n### Why Schema-Aware AI Is Critical\n\nTraditional AI code generation fails for infrastructure because:\n\n```\nGeneric AI (like GitHub Copilot):\n❌ Generates syntactically correct but semantically wrong configs\n❌ Doesn't understand cloud provider constraints\n❌ No validation against schemas\n❌ No security policy enforcement\n❌ Hallucinated resource names/IDs\n```\n\n**Schema-aware AI** (our approach):\n```\n# Nickel schema provides ground truth\n{\n Database = {\n engine | [| 'postgres, 'mysql, 'mongodb |],\n version | String,\n storage_gb | Number,\n backup_retention_days | Number,\n }\n}\n\n# AI generates ONLY valid configs\n# AI knows:\n# - Valid engine values ('postgres', not 'postgresql')\n# - Required fields (all listed above)\n# - Type constraints (storage_gb is Number, not String)\n# - Nickel contracts (if defined)\n```\n\n**Result**: AI cannot generate invalid configs.\n\n### Why RAG (Retrieval-Augmented Generation) Is Essential\n\nLLMs alone have limitations:\n\n```\nPure LLM:\n❌ Knowledge cutoff (no recent updates)\n❌ Hallucinations (invents plausible-sounding configs)\n❌ No project-specific knowledge\n❌ No access to past deployments\n```\n\n**RAG-enhanced LLM**:\n```\nQuery: "How to configure Postgres with encryption?"\n\nRAG retrieves:\n- Nickel schema: provisioning/schemas/database.ncl\n- Documentation: docs/user/database-encryption.md\n- Past deployment: workspaces/prod/postgres-encrypted.ncl\n- Best practice: .claude/patterns/secure-database.md\n\nLLM generates answer WITH retrieved context:\n✅ Accurate (based on actual schemas)\n✅ Project-specific (uses our patterns)\n✅ Proven (learned from past deployments)\n✅ Secure (follows our security guidelines)\n```\n\n### Why Human-in-the-Loop Is Non-Negotiable\n\nAI-generated infrastructure configs require human approval:\n\n```\n// All AI operations require approval\npub async fn ai_generate_config(request: GenerateRequest) -> Result {\n let ai_generated = ai_service.generate(request).await?;\n\n // Validate against Nickel schema\n let validation = nickel_validate(&ai_generated)?;\n if !validation.is_valid() {\n return Err("AI generated invalid config");\n }\n\n // Check Cedar policies\n let authorized = cedar_authorize(\n principal: user,\n action: "approve_ai_config",\n resource: ai_generated,\n )?;\n if !authorized {\n return Err("User not authorized to approve AI config");\n }\n\n // Require explicit human approval\n let approval = prompt_user_approval(&ai_generated).await?;\n if !approval.approved {\n audit_log("AI config rejected by user", &ai_generated);\n return Err("User rejected AI-generated config");\n }\n\n audit_log("AI config approved by user", &ai_generated);\n Ok(ai_generated)\n}\n```\n\n**Why**:\n- Infrastructure changes have real-world cost and security impact\n- AI can make mistakes (hallucinations, misunderstandings)\n- Compliance requires human accountability\n- Learning opportunity (human reviews teach AI)\n\n### Why Multi-Provider Support Matters\n\nNo single LLM provider is best for all tasks:\n\n| Provider | Best For | Considerations |\n| ---------- | ---------- | ---------------- |\n| **Anthropic (Claude)** | Long context, accuracy | ✅ Best for complex configs |\n| **OpenAI (GPT-4)** | Tool calling, speed | ✅ Best for quick suggestions |\n| **Local (Llama, Mistral)** | Privacy, cost | ✅ Best for air-gapped envs |\n\n**Strategy**:\n- Complex config generation → Claude (long context)\n- Real-time form suggestions → GPT-4 (fast)\n- Air-gapped deployments → Local models (privacy)\n\n## Consequences\n\n### Positive\n\n- **Accessibility**: Non-experts can provision infrastructure\n- **Productivity**: 10x faster configuration creation\n- **Quality**: AI injects best practices automatically\n- **Consistency**: Standardized configurations across teams\n- **Learning**: Users learn from AI explanations\n- **Troubleshooting**: AI-assisted debugging reduces MTTR\n- **Documentation**: Contextual help embedded in workflow\n- **Safety**: Schema validation prevents invalid configs\n- **Security**: Cedar policies control AI access\n- **Auditability**: Complete trail of AI operations\n\n### Negative\n\n- **Dependency**: Requires LLM API access (or local models)\n- **Cost**: LLM API calls have per-token cost\n- **Latency**: AI responses take 1-5 seconds\n- **Accuracy**: AI can still make mistakes (needs validation)\n- **Trust**: Users must understand AI limitations\n- **Complexity**: Additional infrastructure to operate\n- **Privacy**: Configs sent to LLM providers (unless local)\n\n### Mitigation Strategies\n\n**Cost Control**:\n```\n[ai.rate_limiting]\nrequests_per_minute = 60\ntokens_per_day = 1000000\ncost_limit_per_day = "100.00" # USD\n\n[ai.caching]\nenabled = true\nttl = "1h"\n# Cache similar queries to reduce API calls\n```\n\n**Latency Optimization**:\n```\n// Streaming responses for real-time feedback\npub async fn ai_generate_stream(request: GenerateRequest) -> impl Stream {\n ai_service\n .generate_stream(request)\n .await\n .map(|chunk| chunk.text)\n}\n```\n\n**Privacy (Local Models)**:\n```\n[ai]\nprovider = "local"\nmodel_path = "/opt/provisioning/models/llama-3-70b"\n\n# No data leaves the network\n```\n\n**Validation (Defense in Depth)**:\n```\nAI generates config\n ↓\nNickel schema validation (syntax, types, contracts)\n ↓\nCedar policy check (security, compliance)\n ↓\nHuman approval (final gate)\n ↓\nDeployment\n```\n\n**Observability**:\n```\n[ai.observability]\ntrace_all_requests = true\nstore_conversations = true\nconversation_retention = "30d"\n\n# Every AI operation logged:\n# - Input prompt\n# - Retrieved context (RAG)\n# - Generated output\n# - Validation results\n# - Human approval decision\n```\n\n## Alternatives Considered\n\n### Alternative 1: No AI Integration\n\n**Pros**: Simpler, no LLM dependencies\n**Cons**: Steep learning curve, slow provisioning, manual troubleshooting\n**Decision**: REJECTED - Poor user experience (10x slower provisioning, high error rate)\n\n### Alternative 2: Generic AI Code Generation (GitHub Copilot approach)\n\n**Pros**: Existing tools, well-known UX\n**Cons**: Not schema-aware, generates invalid configs, no validation\n**Decision**: REJECTED - Inadequate for infrastructure (correctness critical)\n\n### Alternative 3: AI Only for Documentation/Search\n\n**Pros**: Lower risk (AI doesn't generate configs)\n**Cons**: Missed opportunity for 10x productivity gains\n**Decision**: REJECTED - Too conservative\n\n### Alternative 4: Fully Autonomous AI (No Human Approval)\n\n**Pros**: Maximum automation\n**Cons**: Unacceptable risk for infrastructure changes\n**Decision**: REJECTED - Safety and compliance requirements\n\n### Alternative 5: Single LLM Provider Lock-in\n\n**Pros**: Simpler integration\n**Cons**: Vendor lock-in, no flexibility for different use cases\n**Decision**: REJECTED - Multi-provider abstraction provides flexibility\n\n## Implementation Details\n\n### AI Service API\n\n```\n// platform/crates/ai-service/src/lib.rs\n\n#[async_trait]\npub trait AIService {\n async fn generate_config(\n &self,\n prompt: &str,\n schema: &NickelSchema,\n context: Option,\n ) -> Result;\n\n async fn suggest_field_value(\n &self,\n field: &FieldDefinition,\n partial_input: &str,\n form_context: &FormContext,\n ) -> Result>;\n\n async fn explain_validation_error(\n &self,\n error: &ValidationError,\n config: &Config,\n ) -> Result;\n\n async fn troubleshoot_deployment(\n &self,\n deployment_id: &str,\n logs: &DeploymentLogs,\n ) -> Result;\n}\n\npub struct AIServiceImpl {\n mcp_client: MCPClient,\n rag: RAGService,\n cedar: CedarEngine,\n audit: AuditLogger,\n rate_limiter: RateLimiter,\n cache: Cache,\n}\n\nimpl AIService for AIServiceImpl {\n async fn generate_config(\n &self,\n prompt: &str,\n schema: &NickelSchema,\n context: Option,\n ) -> Result {\n // Check authorization\n self.cedar.authorize(\n principal: current_user(),\n action: "ai:generate_config",\n resource: schema,\n )?;\n\n // Rate limiting\n self.rate_limiter.check(current_user()).await?;\n\n // Retrieve relevant context via RAG\n let rag_context = match context {\n Some(ctx) => ctx,\n None => self.rag.retrieve(prompt, schema).await?,\n };\n\n // Generate config via MCP\n let generated = self.mcp_client.generate(\n prompt: prompt,\n schema: schema,\n context: rag_context,\n tools: &["nickel_validate", "schema_query"],\n ).await?;\n\n // Validate generated config\n let validation = nickel_validate(&generated.config)?;\n if !validation.is_valid() {\n return Err(AIError::InvalidGeneration(validation.errors));\n }\n\n // Audit log\n self.audit.log(AIOperation::GenerateConfig {\n user: current_user(),\n prompt: prompt,\n schema: schema.name(),\n generated: &generated.config,\n validation: validation,\n });\n\n Ok(GeneratedConfig {\n config: generated.config,\n explanation: generated.explanation,\n confidence: generated.confidence,\n validation: validation,\n })\n }\n}\n```\n\n### MCP Server Integration\n\n```\n// platform/crates/mcp-server/src/lib.rs\n\npub struct MCPClient {\n provider: Box,\n tools: ToolRegistry,\n}\n\n#[async_trait]\npub trait LLMProvider {\n async fn generate(&self, request: GenerateRequest) -> Result;\n async fn generate_stream(&self, request: GenerateRequest) -> Result>;\n}\n\n// Tool definitions for LLM\npub struct ToolRegistry {\n tools: HashMap,\n}\n\nimpl ToolRegistry {\n pub fn new() -> Self {\n let mut tools = HashMap::new();\n\n tools.insert("nickel_validate", Tool {\n name: "nickel_validate",\n description: "Validate Nickel configuration against schema",\n parameters: json!({\n "type": "object",\n "properties": {\n "config": {"type": "string"},\n "schema_path": {"type": "string"},\n },\n "required": ["config", "schema_path"],\n }),\n handler: Box::new(|params| async {\n let config = params["config"].as_str().unwrap();\n let schema = params["schema_path"].as_str().unwrap();\n nickel_validate_tool(config, schema).await\n }),\n });\n\n tools.insert("schema_query", Tool {\n name: "schema_query",\n description: "Query Nickel schema for field information",\n parameters: json!({\n "type": "object",\n "properties": {\n "schema_path": {"type": "string"},\n "query": {"type": "string"},\n },\n "required": ["schema_path"],\n }),\n handler: Box::new(|params| async {\n let schema = params["schema_path"].as_str().unwrap();\n let query = params.get("query").and_then(|v| v.as_str());\n schema_query_tool(schema, query).await\n }),\n });\n\n Self { tools }\n }\n}\n```\n\n### RAG System Implementation\n\n```\n// platform/crates/rag/src/lib.rs\n\npub struct RAGService {\n vector_store: Box,\n embeddings: EmbeddingModel,\n indexer: DocumentIndexer,\n}\n\nimpl RAGService {\n pub async fn index_all(&self) -> Result<()> {\n // Index Nickel schemas\n self.index_schemas("provisioning/schemas").await?;\n\n // Index documentation\n self.index_docs("docs").await?;\n\n // Index past deployments\n self.index_deployments("workspaces").await?;\n\n // Index best practices\n self.index_patterns(".claude/patterns").await?;\n\n Ok(())\n }\n\n pub async fn retrieve(\n &self,\n query: &str,\n schema: &NickelSchema,\n ) -> Result {\n // Generate query embedding\n let query_embedding = self.embeddings.embed(query).await?;\n\n // Search vector store\n let results = self.vector_store.search(\n embedding: query_embedding,\n top_k: 10,\n filter: Some(json!({\n "schema": schema.name(),\n })),\n ).await?;\n\n // Rerank results\n let reranked = self.rerank(query, results).await?;\n\n // Build context\n Ok(RAGContext {\n query: query.to_string(),\n schema_definition: schema.to_string(),\n relevant_docs: reranked.iter()\n .take(5)\n .map(|r| r.content.clone())\n .collect(),\n similar_configs: self.find_similar_configs(schema).await?,\n best_practices: self.find_best_practices(schema).await?,\n })\n }\n}\n\n#[async_trait]\npub trait VectorStore {\n async fn insert(&self, id: &str, embedding: Vec, metadata: Value) -> Result<()>;\n async fn search(&self, embedding: Vec, top_k: usize, filter: Option) -> Result>;\n}\n\n// Qdrant implementation\npub struct QdrantStore {\n client: qdrant::QdrantClient,\n collection: String,\n}\n```\n\n### typdialog-ai Integration\n\n```\n// typdialog-ai/src/form_assistant.rs\n\npub struct FormAssistant {\n ai_service: Arc,\n}\n\nimpl FormAssistant {\n pub async fn suggest_field_value(\n &self,\n field: &FieldDefinition,\n partial_input: &str,\n form_context: &FormContext,\n ) -> Result> {\n self.ai_service.suggest_field_value(\n field,\n partial_input,\n form_context,\n ).await\n }\n\n pub async fn explain_error(\n &self,\n error: &ValidationError,\n field_value: &str,\n ) -> Result {\n let explanation = self.ai_service.explain_validation_error(\n error,\n field_value,\n ).await?;\n\n Ok(format!(\n "Error: {}\n\nExplanation: {}\n\nSuggested fix: {}",\n error.message,\n explanation.plain_english,\n explanation.suggested_fix,\n ))\n }\n\n pub async fn fill_from_natural_language(\n &self,\n description: &str,\n form_schema: &FormSchema,\n ) -> Result> {\n let prompt = format!(\n "User wants to: {}\n\nForm schema: {}\n\nGenerate field values:",\n description,\n serde_json::to_string_pretty(form_schema)?,\n );\n\n let generated = self.ai_service.generate_config(\n &prompt,\n &form_schema.nickel_schema,\n None,\n ).await?;\n\n Ok(generated.field_values)\n }\n}\n```\n\n### typdialog-ag Agents\n\n```\n// typdialog-ag/src/agent.rs\n\npub struct ProvisioningAgent {\n ai_service: Arc,\n orchestrator: Arc,\n max_iterations: usize,\n}\n\nimpl ProvisioningAgent {\n pub async fn execute_goal(&self, goal: &str) -> Result {\n let mut state = AgentState::new(goal);\n\n for iteration in 0..self.max_iterations {\n // AI determines next action\n let action = self.ai_service.agent_next_action(&state).await?;\n\n // Execute action (with human approval for critical operations)\n let result = self.execute_action(&action, &state).await?;\n\n // Update state\n state.update(action, result);\n\n // Check if goal achieved\n if state.goal_achieved() {\n return Ok(AgentResult::Success(state));\n }\n }\n\n Err(AgentError::MaxIterationsReached)\n }\n\n async fn execute_action(\n &self,\n action: &AgentAction,\n state: &AgentState,\n ) -> Result {\n match action {\n AgentAction::GenerateConfig { description } => {\n let config = self.ai_service.generate_config(\n description,\n &state.target_schema,\n Some(state.context.clone()),\n ).await?;\n\n Ok(ActionResult::ConfigGenerated(config))\n },\n\n AgentAction::Deploy { config } => {\n // Require human approval for deployment\n let approval = prompt_user_approval(\n "Agent wants to deploy. Approve?",\n config,\n ).await?;\n\n if !approval.approved {\n return Ok(ActionResult::DeploymentRejected);\n }\n\n let deployment = self.orchestrator.deploy(config).await?;\n Ok(ActionResult::Deployed(deployment))\n },\n\n AgentAction::Troubleshoot { deployment_id } => {\n let report = self.ai_service.troubleshoot_deployment(\n deployment_id,\n &self.orchestrator.get_logs(deployment_id).await?,\n ).await?;\n\n Ok(ActionResult::TroubleshootingReport(report))\n },\n }\n }\n}\n```\n\n### Cedar Policies for AI\n\n```\n// AI cannot access secrets without explicit permission\nforbid(\n principal == Service::"ai-service",\n action == Action::"read",\n resource in Secret::"*"\n);\n\n// AI can generate configs for non-production environments without approval\npermit(\n principal == Service::"ai-service",\n action == Action::"generate_config",\n resource in Schema::"*"\n) when {\n resource.environment in ["dev", "staging"]\n};\n\n// AI config generation for production requires senior engineer approval\npermit(\n principal in Group::"senior-engineers",\n action == Action::"approve_ai_config",\n resource in Config::"*"\n) when {\n resource.environment == "production" &&\n resource.generated_by == "ai-service"\n};\n\n// AI agents cannot deploy without human approval\nforbid(\n principal == Service::"ai-agent",\n action == Action::"deploy",\n resource == Infrastructure::"*"\n) unless {\n context.human_approved == true\n};\n```\n\n## Testing Strategy\n\n**Unit Tests**:\n```\n#[tokio::test]\nasync fn test_ai_config_generation_validates() {\n let ai_service = mock_ai_service();\n\n let generated = ai_service.generate_config(\n "Create a PostgreSQL database with encryption",\n &postgres_schema(),\n None,\n ).await.unwrap();\n\n // Must validate against schema\n assert!(generated.validation.is_valid());\n assert_eq!(generated.config["engine"], "postgres");\n assert_eq!(generated.config["encryption_enabled"], true);\n}\n\n#[tokio::test]\nasync fn test_ai_cannot_access_secrets() {\n let ai_service = ai_service_with_cedar();\n\n let result = ai_service.get_secret("database/password").await;\n\n assert!(result.is_err());\n assert_eq!(result.unwrap_err(), AIError::PermissionDenied);\n}\n```\n\n**Integration Tests**:\n```\n#[tokio::test]\nasync fn test_end_to_end_ai_config_generation() {\n // User provides natural language\n let description = "Create a production Kubernetes cluster in AWS with 5 nodes";\n\n // AI generates config\n let generated = ai_service.generate_config(description).await.unwrap();\n\n // Nickel validation\n let validation = nickel_validate(&generated.config).await.unwrap();\n assert!(validation.is_valid());\n\n // Human approval\n let approval = Approval {\n user: "senior-engineer@example.com",\n approved: true,\n timestamp: Utc::now(),\n };\n\n // Deploy\n let deployment = orchestrator.deploy_with_approval(\n generated.config,\n approval,\n ).await.unwrap();\n\n assert_eq!(deployment.status, DeploymentStatus::Success);\n}\n```\n\n**RAG Quality Tests**:\n```\n#[tokio::test]\nasync fn test_rag_retrieval_accuracy() {\n let rag = rag_service();\n\n // Index test documents\n rag.index_all().await.unwrap();\n\n // Query\n let context = rag.retrieve(\n "How to configure PostgreSQL with encryption?",\n &postgres_schema(),\n ).await.unwrap();\n\n // Should retrieve relevant docs\n assert!(context.relevant_docs.iter().any(|doc| {\n doc.contains("encryption") && doc.contains("postgres")\n }));\n\n // Should retrieve similar configs\n assert!(!context.similar_configs.is_empty());\n}\n```\n\n## Security Considerations\n\n**AI Access Control**:\n```\nAI Service Permissions (enforced by Cedar):\n✅ CAN: Read Nickel schemas\n✅ CAN: Generate configurations\n✅ CAN: Query documentation\n✅ CAN: Analyze deployment logs (sanitized)\n❌ CANNOT: Access secrets directly\n❌ CANNOT: Deploy without approval\n❌ CANNOT: Modify Cedar policies\n❌ CANNOT: Access user credentials\n```\n\n**Data Privacy**:\n```\n[ai.privacy]\n# Sanitize before sending to LLM\nsanitize_secrets = true\nsanitize_pii = true\nsanitize_credentials = true\n\n# What gets sent to LLM:\n# ✅ Nickel schemas (public)\n# ✅ Documentation (public)\n# ✅ Error messages (sanitized)\n# ❌ Secret values (never)\n# ❌ Passwords (never)\n# ❌ API keys (never)\n```\n\n**Audit Trail**:\n```\n// Every AI operation logged\npub struct AIAuditLog {\n timestamp: DateTime,\n user: UserId,\n operation: AIOperation,\n input_prompt: String,\n generated_output: String,\n validation_result: ValidationResult,\n human_approval: Option,\n deployment_outcome: Option,\n}\n```\n\n## Cost Analysis\n\n**Estimated Costs** (per month, based on typical usage):\n\n```\nAssumptions:\n- 100 active users\n- 10 AI config generations per user per day\n- Average prompt: 2000 tokens\n- Average response: 1000 tokens\n\nProvider: Anthropic Claude Sonnet\nCost: $3 per 1M input tokens, $15 per 1M output tokens\n\nMonthly cost:\n= 100 users × 10 generations × 30 days × (2000 input + 1000 output tokens)\n= 100 × 10 × 30 × 3000 tokens\n= 90M tokens\n= (60M input × $3/1M) + (30M output × $15/1M)\n= $180 + $450\n= $630/month\n\nWith caching (50% hit rate):\n= $315/month\n```\n\n**Cost optimization strategies**:\n- Caching (50-80% cost reduction)\n- Streaming (lower latency, same cost)\n- Local models for non-critical operations (zero marginal cost)\n- Rate limiting (prevent runaway costs)\n\n## References\n\n- [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)\n- [Anthropic Claude API](https://docs.anthropic.com/claude/reference/getting-started)\n- [OpenAI GPT-4 API](https://platform.openai.com/docs/api-reference)\n- [Qdrant Vector Database](https://qdrant.tech/)\n- [RAG Survey Paper](https://arxiv.org/abs/2312.10997)\n- ADR-008: Cedar Authorization (AI access control)\n- ADR-011: Nickel Migration (schema-driven AI)\n- ADR-013: Typdialog Web UI Backend (AI-assisted forms)\n- ADR-014: SecretumVault Integration (AI-secret isolation)\n\n---\n\n**Status**: Accepted\n**Last Updated**: 2025-01-08\n**Implementation**: Planned (High Priority)\n**Estimated Complexity**: Very Complex\n**Dependencies**: ADR-008, ADR-011, ADR-013, ADR-014