# Schema Validation Pipeline Runtime validation system for MCP tools and agent task assignments using Nickel contracts. ## Overview The Schema Validation Pipeline prevents downstream errors by validating inputs before execution. It uses Nickel schemas with contracts to enforce type safety, business rules, and data constraints at runtime. **Problem Solved:** VAPORA previously assumed valid inputs or failed downstream. This caused: - Invalid UUIDs reaching database queries - Empty strings bypassing business logic - Out-of-range priorities corrupting task queues - Malformed contexts breaking agent execution **Solution:** Validate all inputs against Nickel schemas before execution. ## Architecture ```text ┌─────────────────────────────────────────────────────────────────┐ │ Client Request │ │ (MCP Tool Invocation / Agent Task Assignment) │ └────────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ ValidationPipeline │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ 1. Load schema from SchemaRegistry (cached) │ │ │ │ 2. Validate types (String, Number, Array, Object) │ │ │ │ 3. Check required fields │ │ │ │ 4. Apply contracts (NonEmpty, UUID, Range, etc.) │ │ │ │ 5. Apply default values │ │ │ │ 6. Return ValidationResult (valid + errors + data) │ │ │ └──────────────────────────────────────────────────────────┘ │ └────────────────────────────┬────────────────────────────────────┘ │ ┌──────────┴──────────┐ │ │ ▼ ▼ ┌──────────────┐ ┌─────────────┐ │ Valid? │ │ Invalid? │ │ Execute │ │ Reject │ │ with data │ │ with errors│ └──────────────┘ └─────────────┘ ``` ## Components ### 1. ValidationPipeline Core validation engine in `vapora-shared/src/validation/pipeline.rs`. **Key Methods:** ```rust pub async fn validate( &self, schema_name: &str, input: &Value ) -> Result ``` **Validation Steps:** 1. Load compiled schema from registry 2. Validate field types (String, Number, Bool, Array, Object) 3. Check required fields (reject if missing) 4. Apply contracts (NonEmpty, UUID, Range, Email, etc.) 5. Apply default values for optional fields 6. Return ValidationResult with errors (if any) **Strict Mode:** Rejects unknown fields not in schema. ### 2. SchemaRegistry Schema loading and caching in `vapora-shared/src/validation/schema_registry.rs`. **Features:** - **Caching:** Compiled schemas cached in memory (Arc + RwLock) - **Hot Reload:** `invalidate(schema_name)` to reload without restart - **Schema Sources:** File system, embedded string, or URL (future) **Schema Structure:** ```rust pub struct CompiledSchema { pub name: String, pub fields: HashMap, } pub struct FieldSchema { pub field_type: FieldType, pub required: bool, pub contracts: Vec, pub default: Option, } ``` ### 3. NickelBridge CLI integration for Nickel operations in `vapora-shared/src/validation/nickel_bridge.rs`. **Operations:** - `typecheck(path)` — Validate Nickel syntax - `export(path)` — Export schema as JSON - `query(path, field)` — Query specific field - `is_available()` — Check if Nickel CLI is installed **Timeout Protection:** 30s default to prevent DoS from malicious Nickel code. ## Nickel Schemas Located in `schemas/` directory (workspace root). ### Directory Structure ```text schemas/ ├── tools/ # MCP tool parameter validation │ ├── kanban_create_task.ncl │ ├── kanban_update_task.ncl │ ├── assign_task_to_agent.ncl │ ├── get_project_summary.ncl │ └── get_agent_capabilities.ncl └── agents/ # Agent task assignment validation └── task_assignment.ncl ``` ### Schema Format ```nickel { tool_name = "example_tool", parameters = { # Required field with contracts user_id | String | doc "User UUID" | std.string.NonEmpty | std.string.match "^[0-9a-f]{8}-[0-9a-f]{4}-...$", # Optional field with default priority | Number | doc "Priority score (0-100)" | std.number.between 0 100 | default = 50, }, } ``` ### Supported Contracts | Contract | Description | Example | |----------|-------------|---------| | `std.string.NonEmpty` | String cannot be empty | Required text fields | | `std.string.length.min N` | Minimum length | `min 3` for titles | | `std.string.length.max N` | Maximum length | `max 200` for titles | | `std.string.match PATTERN` | Regex validation | UUID format | | `std.number.between A B` | Numeric range | `between 0 100` | | `std.number.greater_than N` | Minimum value (exclusive) | `> -1` | | `std.number.less_than N` | Maximum value (exclusive) | `< 1000` | | `std.enum.TaggedUnion` | Enum validation | `[| 'low, 'high |]` | ## Integration Points ### MCP Server Location: `crates/vapora-mcp-server/src/main.rs` ```rust // Initialize validation pipeline let schema_dir = std::env::var("VAPORA_SCHEMA_DIR") .unwrap_or_else(|_| "schemas".to_string()); let registry = Arc::new(SchemaRegistry::new(PathBuf::from(&schema_dir))); let validation = Arc::new(ValidationPipeline::new(registry)); // Add to AppState #[derive(Clone)] struct AppState { validation: Arc, } // Validate in handler async fn invoke_tool( State(state): State, Json(request): Json, ) -> impl IntoResponse { let schema_name = format!("tools/{}", request.tool); let validation_result = state .validation .validate(&schema_name, &request.parameters) .await?; if !validation_result.valid { return (StatusCode::BAD_REQUEST, Json(validation_errors)); } // Execute with validated data let validated_params = validation_result.validated_data.unwrap(); // ... } ``` ### Agent Coordinator Location: `crates/vapora-agents/src/coordinator.rs` ```rust pub struct AgentCoordinator { validation: Arc, // ... } impl AgentCoordinator { pub async fn assign_task( &self, role: &str, title: String, description: String, context: String, priority: u32, ) -> Result { // Validate inputs let input = serde_json::json!({ "role": role, "title": &title, "description": &description, "context": &context, "priority": priority, }); let validation_result = self .validation .validate("agents/task_assignment", &input) .await?; if !validation_result.valid { return Err(CoordinatorError::ValidationError( validation_result.errors.join(", ") )); } // Continue with validated inputs // ... } } ``` ## Usage Patterns ### 1. Validating MCP Tool Inputs ```rust // In MCP server handler let validation_result = state .validation .validate("tools/kanban_create_task", &input) .await?; if !validation_result.valid { let errors: Vec = validation_result .errors .iter() .map(|e| e.to_string()) .collect(); return (StatusCode::BAD_REQUEST, Json(json!({ "success": false, "validation_errors": errors, }))); } // Use validated data with defaults applied let validated_data = validation_result.validated_data.unwrap(); ``` ### 2. Validating Agent Task Assignments ```rust // In AgentCoordinator let input = serde_json::json!({ "role": role, "title": title, "description": description, "context": context, "priority": priority, }); let validation_result = self .validation .validate("agents/task_assignment", &input) .await?; if !validation_result.valid { warn!("Validation failed: {:?}", validation_result.errors); return Err(CoordinatorError::ValidationError( format!("Invalid input: {}", validation_result.errors.join(", ")) )); } ``` ### 3. Hot Reloading Schemas ```rust // Invalidate single schema registry.invalidate("tools/kanban_create_task").await; // Invalidate all schemas (useful for config reload) registry.invalidate_all().await; ``` ## Testing Schemas ### Validate Syntax ```bash nickel typecheck schemas/tools/kanban_create_task.ncl ``` ### Export as JSON ```bash nickel export schemas/tools/kanban_create_task.ncl ``` ### Query Specific Field ```bash nickel query --field parameters.title schemas/tools/kanban_create_task.ncl ``` ## Adding New Schemas 1. Create `.ncl` file in appropriate directory (`tools/` or `agents/`) 2. Define `tool_name` or `schema_name` 3. Define `parameters` or `fields` with types and contracts 4. Add `doc` annotations for documentation 5. Test with `nickel typecheck` 6. Restart services or use hot-reload **Example:** ```nickel # schemas/tools/my_new_tool.ncl { tool_name = "my_new_tool", parameters = { name | String | doc "User name (3-50 chars)" | std.string.NonEmpty | std.string.length.min 3 | std.string.length.max 50, age | Number | doc "User age (0-120)" | std.number.between 0 120, email | String | doc "User email address" | std.string.Email, active | Bool | doc "Account status" | default = true, }, } ``` ## Configuration ### Environment Variables - `VAPORA_SCHEMA_DIR` — Schema directory path (default: `"schemas"`) **In tests:** ```rust std::env::set_var("VAPORA_SCHEMA_DIR", "../../schemas"); ``` **In production:** ```bash export VAPORA_SCHEMA_DIR=/app/schemas ``` ## Performance Characteristics - **Schema Loading:** ~5-10ms (first load, then cached) - **Validation:** ~0.1-0.5ms per request (in-memory) - **Hot Reload:** ~10-20ms (invalidates cache, reloads from disk) **Optimization:** SchemaRegistry uses `Arc>` for concurrent reads. ## Security Considerations ### Timeout Protection NickelBridge enforces 30s timeout on all CLI operations to prevent: - Infinite loops in malicious Nickel code - DoS attacks via crafted schemas - Resource exhaustion ### Input Sanitization Contracts prevent: - SQL injection (via UUID/Email validation) - XSS attacks (via length limits on text fields) - Buffer overflows (via max length constraints) - Type confusion (via strict type checking) ### Schema Validation All schemas must pass `nickel typecheck` before deployment. ## Error Handling ### ValidationResult ```rust pub struct ValidationResult { pub valid: bool, pub errors: Vec, pub validated_data: Option, } pub enum ValidationError { MissingField(String), TypeMismatch { field: String, expected: String, got: String }, ContractViolation { field: String, contract: String, value: String }, InvalidSchema(String), } ``` ### Error Response Format ```json { "success": false, "error": "Validation failed", "validation_errors": [ "Field 'project_id' must match UUID pattern", "Field 'title' must be at least 3 characters", "Field 'priority' must be between 0 and 100" ] } ``` ## Troubleshooting ### Schema Not Found **Error:** `Schema file not found: schemas/tools/my_tool.ncl` **Solution:** Check `VAPORA_SCHEMA_DIR` environment variable and ensure schema file exists. ### Nickel CLI Not Available **Error:** `Nickel CLI not found in PATH` **Solution:** Install Nickel CLI: ```bash cargo install nickel-lang-cli ``` ### Validation Always Fails **Error:** All requests rejected with validation errors **Solution:** Check schema syntax with `nickel typecheck`, verify field names match exactly. ## Future Enhancements - [ ] Remote schema loading (HTTP/S3) - [ ] Schema versioning and migration - [ ] Custom contract plugins - [ ] GraphQL schema generation from Nickel - [ ] OpenAPI spec generation ## Related Documentation - [Nickel Language Documentation](https://nickel-lang.org/) - [VAPORA Architecture Overview](vapora-architecture.md) - [Agent Coordination](agent-registry-coordination.md) - [MCP Protocol Integration](../integrations/mcp-server.md) ## References - **Implementation:** `crates/vapora-shared/src/validation/` - **Schemas:** `schemas/` - **Tests:** `crates/vapora-shared/tests/validation_integration.rs` - **MCP Integration:** `crates/vapora-mcp-server/src/main.rs` - **Agent Integration:** `crates/vapora-agents/src/coordinator.rs`