provisioning/docs/src/ai/configuration.md

# AI System Configuration Guide\n\n**Status**: ✅ Production-Ready (Configuration system)\n\nComplete setup guide for AI features in the provisioning platform. This guide covers LLM provider configuration, feature enablement, cache setup, cost\ncontrols, and security settings.\n\n## Quick Start\n\n### Minimal Configuration\n\n```\n# provisioning/config/ai.toml\n[ai]\nenabled = true\nprovider = "anthropic"  # or "openai" or "local"\nmodel = "claude-sonnet-4"\napi_key = "sk-ant-..."  # Set via PROVISIONING_AI_API_KEY env var\n\n[ai.cache]\nenabled = true\n\n[ai.limits]\nmax_tokens = 4096\ntemperature = 0.7\n```\n\n### Initialize Configuration\n\n```\n# Generate default configuration\nprovisioning config init ai\n\n# Edit configuration\nprovisioning config edit ai\n\n# Validate configuration\nprovisioning config validate ai\n\n# Show current configuration\nprovisioning config show ai\n```\n\n## Provider Configuration\n\n### Anthropic Claude\n\n```\n[ai]\nenabled = true\nprovider = "anthropic"\nmodel = "claude-sonnet-4"  # or "claude-opus-4", "claude-haiku-4"\napi_key = "${PROVISIONING_AI_API_KEY}"\napi_base = "[https://api.anthropic.com"](https://api.anthropic.com")\n\n# Request parameters\n[ai.request]\nmax_tokens = 4096\ntemperature = 0.7\ntop_p = 0.95\ntop_k = 40\n\n# Supported models\n# - claude-opus-4: Most capable, for complex reasoning ($15/MTok input, $45/MTok output)\n# - claude-sonnet-4: Balanced (recommended), ($3/MTok input, $15/MTok output)\n# - claude-haiku-4: Fast, for simple tasks ($0.80/MTok input, $4/MTok output)\n```\n\n### OpenAI GPT-4\n\n```\n[ai]\nenabled = true\nprovider = "openai"\nmodel = "gpt-4-turbo"  # or "gpt-4", "gpt-4o"\napi_key = "${OPENAI_API_KEY}"\napi_base = "[https://api.openai.com/v1"](https://api.openai.com/v1")\n\n[ai.request]\nmax_tokens = 4096\ntemperature = 0.7\ntop_p = 0.95\n\n# Supported models\n# - gpt-4: Most capable ($0.03/1K input, $0.06/1K output)\n# - gpt-4-turbo: Better at code ($0.01/1K input, $0.03/1K output)\n# - gpt-4o: Latest, multi-modal ($5/MTok input, $15/MTok output)\n```\n\n### Local Models\n\n```\n[ai]\nenabled = true\nprovider = "local"\nmodel = "llama2-70b"  # or "mistral", "neural-chat"\napi_base = "[http://localhost:8000"](http://localhost:8000")  # Local Ollama or LM Studio\n\n# Local model support\n# - Ollama: docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama\n# - LM Studio: GUI app with API\n# - vLLM: High-throughput serving\n# - llama.cpp: CPU inference\n\n[ai.local]\ngpu_enabled = true\ngpu_memory_gb = 24\nmax_batch_size = 4\n```\n\n## Feature Configuration\n\n### Enable Specific Features\n\n```\n[ai.features]\n# Core features (production-ready)\nrag_search = true           # Retrieve-Augmented Generation\nconfig_generation = true    # Generate Nickel from natural language\nmcp_server = true           # Model Context Protocol server\ntroubleshooting = true      # AI-assisted debugging\n\n# Form assistance (planned Q2 2025)\nform_assistance = false     # AI suggestions in forms\nform_explanations = false   # AI explains validation errors\n\n# Agents (planned Q2 2025)\nautonomous_agents = false   # AI agents for workflows\nagent_learning = false      # Agents learn from deployments\n\n# Advanced features\nfine_tuning = false        # Fine-tune models for domain\nknowledge_base = false     # Custom knowledge base per workspace\n```\n\n## Cache Configuration\n\n### Cache Strategy\n\n```\n[ai.cache]\nenabled = true\ncache_type = "memory"  # or "redis", "disk"\nttl_seconds = 3600     # Cache entry lifetime\n\n# Memory cache (recommended for single server)\n[ai.cache.memory]\nmax_size_mb = 500\neviction_policy = "lru"  # Least Recently Used\n\n# Redis cache (recommended for distributed)\n[ai.cache.redis]\nurl = "redis://localhost:6379"\ndb = 0\npassword = "${REDIS_PASSWORD}"\nttl_seconds = 3600\n\n# Disk cache (recommended for persistent caching)\n[ai.cache.disk]\npath = "/var/cache/provisioning/ai"\nmax_size_mb = 5000\n\n# Semantic caching (for RAG)\n[ai.cache.semantic]\nenabled = true\nsimilarity_threshold = 0.95  # Cache hit if query simil