# Schema Validation and Best Practices This document provides comprehensive guidance on validating KCL schemas and following best practices for the provisioning package. ## Table of Contents - [Schema Validation](#schema-validation) - [Built-in Constraints](#built-in-constraints) - [Custom Validation](#custom-validation) - [Best Practices](#best-practices) - [Common Patterns](#common-patterns) - [Troubleshooting](#troubleshooting) ## Schema Validation ### Basic Validation ```bash # Validate syntax and run schema checks kcl run config.k # Format and validate all files kcl fmt *.k # Validate with verbose output kcl run config.k --debug # Validate against specific schema kcl vet config.k --schema main.Server ``` ### JSON Output Validation ```bash # Generate and validate JSON output kcl run config.k --format json | jq '.' # Validate JSON schema structure kcl run config.k --format json | jq '.workflow_id // error("Missing workflow_id")' # Pretty print for inspection kcl run config.k --format json | jq '.operations[] | {operation_id, name, provider}' ``` ### Validation in CI/CD ```yaml # GitHub Actions example - name: Validate KCL Schemas run: | find . -name "*.k" -exec kcl fmt {} \; find . -name "*.k" -exec kcl run {} \; # Check for schema changes - name: Check Schema Compatibility run: | kcl run main.k --format json > current_schema.json diff expected_schema.json current_schema.json ``` ## Built-in Constraints ### Server Schema Constraints ```kcl import .main # ✅ Valid server configuration valid_server: main.Server = main.Server { hostname: "web-01" # ✅ Non-empty string required title: "Web Server" # ✅ Non-empty string required labels: "env: prod" # ✅ Required field user: "admin" # ✅ Required field # Optional but validated fields user_ssh_port: 22 # ✅ Valid port number running_timeout: 300 # ✅ Positive integer time_zone: "UTC" # ✅ Valid timezone string } # ❌ Invalid configurations that will fail validation invalid_examples: { # hostname: "" # ❌ Empty hostname not allowed # title: "" # ❌ Empty title not allowed # user_ssh_port: -1 # ❌ Negative port not allowed # running_timeout: 0 # ❌ Zero timeout not allowed } ``` ### Workflow Schema Constraints ```kcl import .main # ✅ Valid workflow with proper constraints valid_workflow: main.BatchWorkflow = main.BatchWorkflow { workflow_id: "deploy_001" # ✅ Non-empty ID required name: "Production Deployment" # ✅ Non-empty name required operations: [ # ✅ At least one operation required main.BatchOperation { operation_id: "create_servers" # ✅ Unique operation ID name: "Create Servers" operation_type: "server" action: "create" parameters: {} timeout: 600 # ✅ Positive timeout priority: 5 # ✅ Valid priority } ] max_parallel_operations: 3 # ✅ Non-negative number global_timeout: 3600 # ✅ Positive global timeout } # ❌ Constraint violations constraint_violations: { # workflow_id: "" # ❌ Empty workflow ID # operations: [] # ❌ Empty operations list # max_parallel_operations: -1 # ❌ Negative parallel limit # global_timeout: 0 # ❌ Zero global timeout } ``` ### Kubernetes Schema Constraints ```kcl import .main # ✅ Valid Kubernetes deployment with constraints valid_k8s: main.K8sDeploy = main.K8sDeploy { name: "webapp" # ✅ Non-empty name namespace: "production" # ✅ Valid namespace spec: main.K8sDeploySpec { replicas: 3 # ✅ Positive replica count containers: [ # ✅ At least one container required main.K8sContainers { name: "app" # ✅ Non-empty container name image: "nginx:1.21" # ✅ Valid image reference resources_requests: main.K8sResources { memory: "128Mi" # ✅ Valid K8s memory format cpu: "100m" # ✅ Valid K8s CPU format } resources_limits: main.K8sResources { memory: "256Mi" # ✅ Limits >= requests (enforced) cpu: "200m" } } ] } } ``` ### Dependency Schema Constraints ```kcl import .main # ✅ Valid dependency definitions valid_dependencies: main.TaskservDependencies = main.TaskservDependencies { name: "kubernetes" # ✅ Lowercase name required requires: ["containerd", "cni"] # ✅ Valid dependency list conflicts: ["docker"] # ✅ Cannot coexist with docker resources: main.ResourceRequirement { cpu: "100m" # ✅ Non-empty CPU requirement memory: "128Mi" # ✅ Non-empty memory requirement disk: "1Gi" # ✅ Non-empty disk requirement } timeout: 600 # ✅ Positive timeout retry_count: 3 # ✅ Non-negative retry count os_support: ["linux"] # ✅ At least one OS required arch_support: ["amd64", "arm64"] # ✅ At least one arch required } # ❌ Constraint violations dependency_violations: { # name: "Kubernetes" # ❌ Must be lowercase # name: "" # ❌ Cannot be empty # timeout: 0 # ❌ Must be positive # retry_count: -1 # ❌ Cannot be negative # os_support: [] # ❌ Must specify at least one OS } ``` ## Custom Validation ### Adding Custom Constraints ```kcl import .main import regex # Custom server schema with additional validation schema CustomServer(main.Server): """Custom server with additional business rules""" # Additional custom fields environment: "dev" | "staging" | "prod" cost_center: str check: # Business rule: production servers must have specific naming environment == "prod" and regex.match(hostname, "^prod-[a-z0-9-]+$"), "Production servers must start with 'prod-'" # Business rule: staging servers have resource limits environment == "staging" and len(taskservs or []) <= 3, "Staging servers limited to 3 taskservs" # Business rule: cost center must be valid cost_center in ["engineering", "operations", "security"], "Invalid cost center: ${cost_center}" # Usage with validation prod_server: CustomServer = CustomServer { hostname: "prod-web-01" # ✅ Matches production naming title: "Production Web Server" labels: "env: prod" user: "admin" environment: "prod" # ✅ Valid environment cost_center: "engineering" # ✅ Valid cost center } ``` ### Conditional Validation ```kcl import .main # Workflow with conditional validation based on environment schema EnvironmentWorkflow(main.BatchWorkflow): """Workflow with environment-specific validation""" environment: "dev" | "staging" | "prod" check: # Production workflows must have monitoring environment == "prod" and monitoring.enabled == True, "Production workflows must enable monitoring" # Production workflows must have rollback enabled environment == "prod" and default_rollback_strategy.enabled == True, "Production workflows must enable rollback" # Development can have shorter timeouts environment == "dev" and global_timeout <= 1800, # 30 minutes "Development workflows should complete within 30 minutes" # Staging must have retry policies environment == "staging" and default_retry_policy.max_attempts >= 2, "Staging workflows must have retry policies" # Valid production workflow prod_workflow: EnvironmentWorkflow = EnvironmentWorkflow { workflow_id: "prod_deploy_001" name: "Production Deployment" environment: "prod" # ✅ Production environment operations: [ main.BatchOperation { operation_id: "deploy" name: "Deploy Application" operation_type: "server" action: "create" parameters: {} } ] # ✅ Required for production monitoring: main.MonitoringConfig { enabled: True backend: "prometheus" } # ✅ Required for production default_rollback_strategy: main.RollbackStrategy { enabled: True strategy: "immediate" } } ``` ### Cross-Field Validation ```kcl import .main # Validate relationships between fields schema ValidatedBatchOperation(main.BatchOperation): """Batch operation with cross-field validation""" check: # Timeout should be reasonable for operation type operation_type == "server" and timeout >= 300, "Server operations need at least 5 minutes timeout" operation_type == "taskserv" and timeout >= 600, "Taskserv operations need at least 10 minutes timeout" # High priority operations should have retry policies priority >= 8 and retry_policy.max_attempts >= 2, "High priority operations should have retry policies" # Parallel operations should have lower priority allow_parallel == True and priority <= 7, "Parallel operations should have lower priority for scheduling" # Validate workflow operation consistency schema ConsistentWorkflow(main.BatchWorkflow): """Workflow with consistent operation validation""" check: # All operation IDs must be unique len(operations) == len([op.operation_id for op in operations] | unique), "All operation IDs must be unique" # Dependencies must reference existing operations all([ dep.target_operation_id in [op.operation_id for op in operations] for op in operations for dep in op.dependencies or [] ]), "All dependencies must reference existing operations" # No circular dependencies (simplified check) len(operations) > 0, "Workflow must have at least one operation" ``` ## Best Practices ### 1. Schema Design Principles ```kcl # ✅ Good: Descriptive field names and documentation schema WellDocumentedServer: """ Server configuration for production workloads Follows company security and operational standards """ # Core identification hostname: str # DNS-compliant hostname fqdn?: str # Fully qualified domain name # Environment classification environment: "dev" | "staging" | "prod" classification: "public" | "internal" | "confidential" # Operational metadata owner_team: str # Team responsible for maintenance cost_center: str # Billing allocation backup_required: bool # Whether automated backups are needed check: len(hostname) > 0 and len(hostname) <= 63, "Hostname must be 1-63 characters" len(owner_team) > 0, "Owner team must be specified" len(cost_center) > 0, "Cost center must be specified" # ❌ Avoid: Unclear field names and missing validation schema PoorlyDocumentedServer: name: str # ❌ Ambiguous - hostname? title? display name? env: str # ❌ No constraints - any string allowed data: {str: str} # ❌ Unstructured data without validation ``` ### 2. Validation Strategy ```kcl # ✅ Good: Layered validation with clear error messages schema ProductionWorkflow(main.BatchWorkflow): """Production workflow with comprehensive validation""" # Business metadata change_request_id: str approver: str maintenance_window?: str check: # Business process validation regex.match(change_request_id, "^CHG-[0-9]{4}-[0-9]{3}$"), "Change request ID must match format CHG-YYYY-NNN" # Operational validation global_timeout <= 14400, # 4 hours max "Production workflows must complete within 4 hours" # Safety validation default_rollback_strategy.enabled == True, "Production workflows must enable rollback" # Monitoring validation monitoring.enabled == True and monitoring.enable_notifications == True, "Production workflows must enable monitoring and notifications" # ✅ Good: Environment-specific defaults with validation schema EnvironmentDefaults: """Environment-specific default configurations""" environment: "dev" | "staging" | "prod" # Default timeouts by environment default_timeout: int = environment == "prod" ? 1800 : (environment == "staging" ? 1200 : 600) # Default retry attempts by environment default_retries: int = environment == "prod" ? 3 : (environment == "staging" ? 2 : 1) # Default monitoring settings monitoring_enabled: bool = environment == "prod" ? True : False check: default_timeout > 0, "Timeout must be positive" default_retries >= 0, "Retries cannot be negative" ``` ### 3. Schema Composition Patterns ```kcl # ✅ Good: Composable schema design schema BaseResource: """Common fields for all resources""" name: str tags: {str: str} = {} created_at?: str updated_at?: str check: len(name) > 0, "Name cannot be empty" regex.match(name, "^[a-z0-9-]+$"), "Name must be lowercase alphanumeric with hyphens" schema MonitoredResource(BaseResource): """Resource with monitoring capabilities""" monitoring_enabled: bool = True alert_thresholds: {str: float} = {} check: monitoring_enabled == True and len(alert_thresholds) > 0, "Monitored resources must define alert thresholds" schema SecureResource(BaseResource): """Resource with security requirements""" encryption_enabled: bool = True access_policy: str compliance_tags: [str] = [] check: encryption_enabled == True, "Security-sensitive resources must enable encryption" len(access_policy) > 0, "Access policy must be defined" "pci" in compliance_tags or "sox" in compliance_tags or "hipaa" in compliance_tags, "Must specify compliance requirements" # Composed schema inheriting multiple patterns schema ProductionDatabase(MonitoredResource, SecureResource): """Production database with full operational requirements""" backup_retention_days: int = 30 high_availability: bool = True check: backup_retention_days >= 7, "Production databases need minimum 7 days backup retention" high_availability == True, "Production databases must be highly available" ``` ### 4. Error Handling Patterns ```kcl # ✅ Good: Comprehensive error scenarios with specific handling schema RobustBatchOperation(main.BatchOperation): """Batch operation with robust error handling""" # Error classification critical_operation: bool = False max_failure_rate: float = 0.1 # Enhanced retry configuration retry_policy: main.RetryPolicy = main.RetryPolicy { max_attempts: critical_operation ? 5 : 3 initial_delay: critical_operation ? 30 : 10 max_delay: critical_operation ? 600 : 300 backoff_multiplier: 2 retry_on_errors: [ "connection_error", "timeout", "rate_limit", "resource_unavailable" ] } # Enhanced rollback strategy rollback_strategy: main.RollbackStrategy = main.RollbackStrategy { enabled: True strategy: critical_operation ? "manual" : "immediate" preserve_partial_state: critical_operation custom_rollback_operations: critical_operation ? [ "create_incident_ticket", "notify_on_call_engineer", "preserve_logs" ] : [] } check: 0 <= max_failure_rate and max_failure_rate <= 1, "Failure rate must be between 0 and 1" critical_operation == True and timeout >= 1800, "Critical operations need extended timeout" ``` ## Common Patterns ### 1. Multi-Environment Configuration ```kcl # Configuration that adapts to environment schema EnvironmentAwareConfig: environment: "dev" | "staging" | "prod" # Computed values based on environment replica_count: int = ( environment == "prod" ? 3 : ( environment == "staging" ? 2 : 1) ) resource_requests: main.K8sResources = main.K8sResources { memory: environment == "prod" ? "512Mi" : "256Mi" cpu: environment == "prod" ? "200m" : "100m" } monitoring_enabled: bool = environment != "dev" backup_enabled: bool = environment == "prod" # Usage pattern prod_config: EnvironmentAwareConfig = EnvironmentAwareConfig { environment: "prod" # replica_count automatically becomes 3 # monitoring_enabled automatically becomes True # backup_enabled automatically becomes True } ``` ### 2. Provider Abstraction ```kcl # Provider-agnostic resource definition schema AbstractServer: """Provider-agnostic server specification""" # Common specification cpu_cores: int memory_gb: int storage_gb: int network_performance: "low" | "moderate" | "high" # Provider-specific mapping provider: "upcloud" | "aws" | "gcp" # Computed provider-specific values instance_type: str = ( provider == "upcloud" ? f"{cpu_cores}xCPU-{memory_gb}GB" : ( provider == "aws" ? f"m5.{cpu_cores == 1 ? 'large' : 'xlarge'}" : ( provider == "gcp" ? f"n2-standard-{cpu_cores}" : "unknown" )) ) storage_type: str = ( provider == "upcloud" ? "MaxIOPS" : ( provider == "aws" ? "gp3" : ( provider == "gcp" ? "pd-ssd" : "standard" )) ) # Multi-provider workflow using abstraction mixed_deployment: main.BatchWorkflow = main.BatchWorkflow { workflow_id: "mixed_deploy_001" name: "Multi-Provider Deployment" operations: [ # UpCloud servers main.BatchOperation { operation_id: "upcloud_servers" provider: "upcloud" parameters: { "instance_type": "2xCPU-4GB" # UpCloud format "storage_type": "MaxIOPS" } }, # AWS servers main.BatchOperation { operation_id: "aws_servers" provider: "aws" parameters: { "instance_type": "m5.large" # AWS format "storage_type": "gp3" } } ] } ``` ### 3. Dependency Management ```kcl # Complex dependency patterns schema DependencyAwareWorkflow(main.BatchWorkflow): """Workflow with intelligent dependency management""" # Categorize operations by type infrastructure_ops: [str] = [ op.operation_id for op in operations if op.operation_type == "server" ] service_ops: [str] = [ op.operation_id for op in operations if op.operation_type == "taskserv" ] validation_ops: [str] = [ op.operation_id for op in operations if op.operation_type == "custom" and "validate" in op.name.lower() ] check: # Infrastructure must come before services all([ len([dep for dep in op.dependencies or [] if dep.target_operation_id in infrastructure_ops]) > 0 for op in operations if op.operation_id in service_ops ]) or len(service_ops) == 0, "Service operations must depend on infrastructure operations" # Validation must come last all([ len([dep for dep in op.dependencies or [] if dep.target_operation_id in service_ops or dep.target_operation_id in infrastructure_ops]) > 0 for op in operations if op.operation_id in validation_ops ]) or len(validation_ops) == 0, "Validation operations must depend on other operations" ``` ## Troubleshooting ### Common Validation Errors #### 1. Missing Required Fields ```bash # Error: attribute 'labels' of Server is required # ❌ Incomplete server definition server: main.Server = main.Server { hostname: "web-01" title: "Web Server" # Missing: labels, user } # ✅ Complete server definition server: main.Server = main.Server { hostname: "web-01" title: "Web Server" labels: "env: prod" # ✅ Required field user: "admin" # ✅ Required field } ``` #### 2. Type Mismatches ```bash # Error: expect int, got str # ❌ Wrong type workflow: main.BatchWorkflow = main.BatchWorkflow { max_parallel_operations: "3" # ❌ String instead of int } # ✅ Correct type workflow: main.BatchWorkflow = main.BatchWorkflow { max_parallel_operations: 3 # ✅ Integer } ``` #### 3. Constraint Violations ```bash # Error: Check failed: hostname cannot be empty # ❌ Constraint violation server: main.Server = main.Server { hostname: "" # ❌ Empty string violates constraint title: "Server" labels: "env: prod" user: "admin" } # ✅ Valid constraint server: main.Server = main.Server { hostname: "web-01" # ✅ Non-empty string title: "Server" labels: "env: prod" user: "admin" } ``` ### Debugging Techniques #### 1. Step-by-step Validation ```bash # Validate incrementally kcl run basic_config.k # Start with minimal config kcl run enhanced_config.k # Add features gradually kcl run complete_config.k # Full configuration ``` #### 2. Schema Introspection ```bash # Check what fields are available kcl run -c 'import .main; main.Server' --format json # Validate against specific schema kcl vet config.k --schema main.Server # Debug with verbose output kcl run config.k --debug --verbose ``` #### 3. Constraint Testing ```kcl # Test constraint behavior test_constraints: { # Test minimum values min_timeout: main.BatchOperation { operation_id: "test" name: "Test" operation_type: "server" action: "create" parameters: {} timeout: 1 # Test minimum allowed } # Test maximum values max_parallel: main.BatchWorkflow { workflow_id: "test" name: "Test" operations: [min_timeout] max_parallel_operations: 100 # Test upper limits } } ``` ### Performance Considerations #### 1. Schema Complexity ```kcl # ✅ Good: Simple, focused schemas schema SimpleServer: hostname: str user: str labels: str check: len(hostname) > 0, "Hostname required" # ❌ Avoid: Overly complex schemas with many computed fields schema OverlyComplexServer: # ... many fields with complex interdependencies # ... computationally expensive check conditions # ... deep nested validations ``` #### 2. Validation Efficiency ```kcl # ✅ Good: Efficient validation schema EfficientValidation: name: str tags: {str: str} check: len(name) > 0, "Name required" # ✅ Simple check len(tags) <= 10, "Maximum 10 tags allowed" # ✅ Simple count check # ❌ Avoid: Expensive validation schema ExpensiveValidation: items: [str] check: # ❌ Expensive nested operations all([regex.match(item, "^[a-z0-9-]+$") for item in items]), "All items must match pattern" ``` This validation guide provides the foundation for creating robust, maintainable KCL schemas with proper error handling and validation strategies.