Vapora/docs/architecture/schema-validation-pipeline.md
2026-01-14 21:12:49 +00:00

14 KiB

Schema Validation Pipeline

Runtime validation system for MCP tools and agent task assignments using Nickel contracts.

Overview

The Schema Validation Pipeline prevents downstream errors by validating inputs before execution. It uses Nickel schemas with contracts to enforce type safety, business rules, and data constraints at runtime.

Problem Solved: VAPORA previously assumed valid inputs or failed downstream. This caused:

  • Invalid UUIDs reaching database queries
  • Empty strings bypassing business logic
  • Out-of-range priorities corrupting task queues
  • Malformed contexts breaking agent execution

Solution: Validate all inputs against Nickel schemas before execution.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Client Request                            │
│           (MCP Tool Invocation / Agent Task Assignment)          │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ValidationPipeline                            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ 1. Load schema from SchemaRegistry (cached)              │  │
│  │ 2. Validate types (String, Number, Array, Object)       │  │
│  │ 3. Check required fields                                 │  │
│  │ 4. Apply contracts (NonEmpty, UUID, Range, etc.)        │  │
│  │ 5. Apply default values                                  │  │
│  │ 6. Return ValidationResult (valid + errors + data)       │  │
│  └──────────────────────────────────────────────────────────┘  │
└────────────────────────────┬────────────────────────────────────┘
                             │
                  ┌──────────┴──────────┐
                  │                     │
                  ▼                     ▼
          ┌──────────────┐      ┌─────────────┐
          │   Valid?     │      │   Invalid?  │
          │  Execute     │      │   Reject    │
          │  with data   │      │   with errors│
          └──────────────┘      └─────────────┘

Components

1. ValidationPipeline

Core validation engine in vapora-shared/src/validation/pipeline.rs.

Key Methods:

pub async fn validate(
    &self,
    schema_name: &str,
    input: &Value
) -> Result<ValidationResult>

Validation Steps:

  1. Load compiled schema from registry
  2. Validate field types (String, Number, Bool, Array, Object)
  3. Check required fields (reject if missing)
  4. Apply contracts (NonEmpty, UUID, Range, Email, etc.)
  5. Apply default values for optional fields
  6. Return ValidationResult with errors (if any)

Strict Mode: Rejects unknown fields not in schema.

2. SchemaRegistry

Schema loading and caching in vapora-shared/src/validation/schema_registry.rs.

Features:

  • Caching: Compiled schemas cached in memory (Arc + RwLock)
  • Hot Reload: invalidate(schema_name) to reload without restart
  • Schema Sources: File system, embedded string, or URL (future)

Schema Structure:

pub struct CompiledSchema {
    pub name: String,
    pub fields: HashMap<String, FieldSchema>,
}

pub struct FieldSchema {
    pub field_type: FieldType,
    pub required: bool,
    pub contracts: Vec<Contract>,
    pub default: Option<Value>,
}

3. NickelBridge

CLI integration for Nickel operations in vapora-shared/src/validation/nickel_bridge.rs.

Operations:

  • typecheck(path) — Validate Nickel syntax
  • export(path) — Export schema as JSON
  • query(path, field) — Query specific field
  • is_available() — Check if Nickel CLI is installed

Timeout Protection: 30s default to prevent DoS from malicious Nickel code.

Nickel Schemas

Located in schemas/ directory (workspace root).

Directory Structure

schemas/
├── tools/              # MCP tool parameter validation
│   ├── kanban_create_task.ncl
│   ├── kanban_update_task.ncl
│   ├── assign_task_to_agent.ncl
│   ├── get_project_summary.ncl
│   └── get_agent_capabilities.ncl
└── agents/             # Agent task assignment validation
    └── task_assignment.ncl

Schema Format

{
  tool_name = "example_tool",

  parameters = {
    # Required field with contracts
    user_id
      | String
      | doc "User UUID"
      | std.string.NonEmpty
      | std.string.match "^[0-9a-f]{8}-[0-9a-f]{4}-...$",

    # Optional field with default
    priority
      | Number
      | doc "Priority score (0-100)"
      | std.number.between 0 100
      | default = 50,
  },
}

Supported Contracts

Contract Description Example
std.string.NonEmpty String cannot be empty Required text fields
std.string.length.min N Minimum length min 3 for titles
std.string.length.max N Maximum length max 200 for titles
std.string.match PATTERN Regex validation UUID format
std.number.between A B Numeric range between 0 100
std.number.greater_than N Minimum value (exclusive) > -1
std.number.less_than N Maximum value (exclusive) < 1000
std.enum.TaggedUnion Enum validation `[

Integration Points

MCP Server

Location: crates/vapora-mcp-server/src/main.rs

// Initialize validation pipeline
let schema_dir = std::env::var("VAPORA_SCHEMA_DIR")
    .unwrap_or_else(|_| "schemas".to_string());
let registry = Arc::new(SchemaRegistry::new(PathBuf::from(&schema_dir)));
let validation = Arc::new(ValidationPipeline::new(registry));

// Add to AppState
#[derive(Clone)]
struct AppState {
    validation: Arc<ValidationPipeline>,
}

// Validate in handler
async fn invoke_tool(
    State(state): State<AppState>,
    Json(request): Json<InvokeToolRequest>,
) -> impl IntoResponse {
    let schema_name = format!("tools/{}", request.tool);
    let validation_result = state
        .validation
        .validate(&schema_name, &request.parameters)
        .await?;

    if !validation_result.valid {
        return (StatusCode::BAD_REQUEST, Json(validation_errors));
    }

    // Execute with validated data
    let validated_params = validation_result.validated_data.unwrap();
    // ...
}

Agent Coordinator

Location: crates/vapora-agents/src/coordinator.rs

pub struct AgentCoordinator {
    validation: Arc<ValidationPipeline>,
    // ...
}

impl AgentCoordinator {
    pub async fn assign_task(
        &self,
        role: &str,
        title: String,
        description: String,
        context: String,
        priority: u32,
    ) -> Result<String, CoordinatorError> {
        // Validate inputs
        let input = serde_json::json!({
            "role": role,
            "title": &title,
            "description": &description,
            "context": &context,
            "priority": priority,
        });

        let validation_result = self
            .validation
            .validate("agents/task_assignment", &input)
            .await?;

        if !validation_result.valid {
            return Err(CoordinatorError::ValidationError(
                validation_result.errors.join(", ")
            ));
        }

        // Continue with validated inputs
        // ...
    }
}

Usage Patterns

1. Validating MCP Tool Inputs

// In MCP server handler
let validation_result = state
    .validation
    .validate("tools/kanban_create_task", &input)
    .await?;

if !validation_result.valid {
    let errors: Vec<String> = validation_result
        .errors
        .iter()
        .map(|e| e.to_string())
        .collect();

    return (StatusCode::BAD_REQUEST, Json(json!({
        "success": false,
        "validation_errors": errors,
    })));
}

// Use validated data with defaults applied
let validated_data = validation_result.validated_data.unwrap();

2. Validating Agent Task Assignments

// In AgentCoordinator
let input = serde_json::json!({
    "role": role,
    "title": title,
    "description": description,
    "context": context,
    "priority": priority,
});

let validation_result = self
    .validation
    .validate("agents/task_assignment", &input)
    .await?;

if !validation_result.valid {
    warn!("Validation failed: {:?}", validation_result.errors);
    return Err(CoordinatorError::ValidationError(
        format!("Invalid input: {}", validation_result.errors.join(", "))
    ));
}

3. Hot Reloading Schemas

// Invalidate single schema
registry.invalidate("tools/kanban_create_task").await;

// Invalidate all schemas (useful for config reload)
registry.invalidate_all().await;

Testing Schemas

Validate Syntax

nickel typecheck schemas/tools/kanban_create_task.ncl

Export as JSON

nickel export schemas/tools/kanban_create_task.ncl

Query Specific Field

nickel query --field parameters.title schemas/tools/kanban_create_task.ncl

Adding New Schemas

  1. Create .ncl file in appropriate directory (tools/ or agents/)
  2. Define tool_name or schema_name
  3. Define parameters or fields with types and contracts
  4. Add doc annotations for documentation
  5. Test with nickel typecheck
  6. Restart services or use hot-reload

Example:

# schemas/tools/my_new_tool.ncl
{
  tool_name = "my_new_tool",

  parameters = {
    name
      | String
      | doc "User name (3-50 chars)"
      | std.string.NonEmpty
      | std.string.length.min 3
      | std.string.length.max 50,

    age
      | Number
      | doc "User age (0-120)"
      | std.number.between 0 120,

    email
      | String
      | doc "User email address"
      | std.string.Email,

    active
      | Bool
      | doc "Account status"
      | default = true,
  },
}

Configuration

Environment Variables

  • VAPORA_SCHEMA_DIR — Schema directory path (default: "schemas")

In tests:

std::env::set_var("VAPORA_SCHEMA_DIR", "../../schemas");

In production:

export VAPORA_SCHEMA_DIR=/app/schemas

Performance Characteristics

  • Schema Loading: ~5-10ms (first load, then cached)
  • Validation: ~0.1-0.5ms per request (in-memory)
  • Hot Reload: ~10-20ms (invalidates cache, reloads from disk)

Optimization: SchemaRegistry uses Arc<RwLock<HashMap>> for concurrent reads.

Security Considerations

Timeout Protection

NickelBridge enforces 30s timeout on all CLI operations to prevent:

  • Infinite loops in malicious Nickel code
  • DoS attacks via crafted schemas
  • Resource exhaustion

Input Sanitization

Contracts prevent:

  • SQL injection (via UUID/Email validation)
  • XSS attacks (via length limits on text fields)
  • Buffer overflows (via max length constraints)
  • Type confusion (via strict type checking)

Schema Validation

All schemas must pass nickel typecheck before deployment.

Error Handling

ValidationResult

pub struct ValidationResult {
    pub valid: bool,
    pub errors: Vec<ValidationError>,
    pub validated_data: Option<Value>,
}

pub enum ValidationError {
    MissingField(String),
    TypeMismatch { field: String, expected: String, got: String },
    ContractViolation { field: String, contract: String, value: String },
    InvalidSchema(String),
}

Error Response Format

{
  "success": false,
  "error": "Validation failed",
  "validation_errors": [
    "Field 'project_id' must match UUID pattern",
    "Field 'title' must be at least 3 characters",
    "Field 'priority' must be between 0 and 100"
  ]
}

Troubleshooting

Schema Not Found

Error: Schema file not found: schemas/tools/my_tool.ncl

Solution: Check VAPORA_SCHEMA_DIR environment variable and ensure schema file exists.

Nickel CLI Not Available

Error: Nickel CLI not found in PATH

Solution: Install Nickel CLI:

cargo install nickel-lang-cli

Validation Always Fails

Error: All requests rejected with validation errors

Solution: Check schema syntax with nickel typecheck, verify field names match exactly.

Future Enhancements

  • Remote schema loading (HTTP/S3)
  • Schema versioning and migration
  • Custom contract plugins
  • GraphQL schema generation from Nickel
  • OpenAPI spec generation

References

  • Implementation: crates/vapora-shared/src/validation/
  • Schemas: schemas/
  • Tests: crates/vapora-shared/tests/validation_integration.rs
  • MCP Integration: crates/vapora-mcp-server/src/main.rs
  • Agent Integration: crates/vapora-agents/src/coordinator.rs