508 lines
14 KiB
Markdown
508 lines
14 KiB
Markdown
|
|
# Schema Validation Pipeline
|
||
|
|
|
||
|
|
Runtime validation system for MCP tools and agent task assignments using Nickel contracts.
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The Schema Validation Pipeline prevents downstream errors by validating inputs before execution. It uses Nickel schemas with contracts to enforce type safety, business rules, and data constraints at runtime.
|
||
|
|
|
||
|
|
**Problem Solved:** VAPORA previously assumed valid inputs or failed downstream. This caused:
|
||
|
|
|
||
|
|
- Invalid UUIDs reaching database queries
|
||
|
|
- Empty strings bypassing business logic
|
||
|
|
- Out-of-range priorities corrupting task queues
|
||
|
|
- Malformed contexts breaking agent execution
|
||
|
|
|
||
|
|
**Solution:** Validate all inputs against Nickel schemas before execution.
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
```text
|
||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
||
|
|
│ Client Request │
|
||
|
|
│ (MCP Tool Invocation / Agent Task Assignment) │
|
||
|
|
└────────────────────────────┬────────────────────────────────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
||
|
|
│ ValidationPipeline │
|
||
|
|
│ ┌──────────────────────────────────────────────────────────┐ │
|
||
|
|
│ │ 1. Load schema from SchemaRegistry (cached) │ │
|
||
|
|
│ │ 2. Validate types (String, Number, Array, Object) │ │
|
||
|
|
│ │ 3. Check required fields │ │
|
||
|
|
│ │ 4. Apply contracts (NonEmpty, UUID, Range, etc.) │ │
|
||
|
|
│ │ 5. Apply default values │ │
|
||
|
|
│ │ 6. Return ValidationResult (valid + errors + data) │ │
|
||
|
|
│ └──────────────────────────────────────────────────────────┘ │
|
||
|
|
└────────────────────────────┬────────────────────────────────────┘
|
||
|
|
│
|
||
|
|
┌──────────┴──────────┐
|
||
|
|
│ │
|
||
|
|
▼ ▼
|
||
|
|
┌──────────────┐ ┌─────────────┐
|
||
|
|
│ Valid? │ │ Invalid? │
|
||
|
|
│ Execute │ │ Reject │
|
||
|
|
│ with data │ │ with errors│
|
||
|
|
└──────────────┘ └─────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
## Components
|
||
|
|
|
||
|
|
### 1. ValidationPipeline
|
||
|
|
|
||
|
|
Core validation engine in `vapora-shared/src/validation/pipeline.rs`.
|
||
|
|
|
||
|
|
**Key Methods:**
|
||
|
|
|
||
|
|
```rust
|
||
|
|
pub async fn validate(
|
||
|
|
&self,
|
||
|
|
schema_name: &str,
|
||
|
|
input: &Value
|
||
|
|
) -> Result<ValidationResult>
|
||
|
|
```
|
||
|
|
|
||
|
|
**Validation Steps:**
|
||
|
|
|
||
|
|
1. Load compiled schema from registry
|
||
|
|
2. Validate field types (String, Number, Bool, Array, Object)
|
||
|
|
3. Check required fields (reject if missing)
|
||
|
|
4. Apply contracts (NonEmpty, UUID, Range, Email, etc.)
|
||
|
|
5. Apply default values for optional fields
|
||
|
|
6. Return ValidationResult with errors (if any)
|
||
|
|
|
||
|
|
**Strict Mode:** Rejects unknown fields not in schema.
|
||
|
|
|
||
|
|
### 2. SchemaRegistry
|
||
|
|
|
||
|
|
Schema loading and caching in `vapora-shared/src/validation/schema_registry.rs`.
|
||
|
|
|
||
|
|
**Features:**
|
||
|
|
|
||
|
|
- **Caching:** Compiled schemas cached in memory (Arc + RwLock)
|
||
|
|
- **Hot Reload:** `invalidate(schema_name)` to reload without restart
|
||
|
|
- **Schema Sources:** File system, embedded string, or URL (future)
|
||
|
|
|
||
|
|
**Schema Structure:**
|
||
|
|
|
||
|
|
```rust
|
||
|
|
pub struct CompiledSchema {
|
||
|
|
pub name: String,
|
||
|
|
pub fields: HashMap<String, FieldSchema>,
|
||
|
|
}
|
||
|
|
|
||
|
|
pub struct FieldSchema {
|
||
|
|
pub field_type: FieldType,
|
||
|
|
pub required: bool,
|
||
|
|
pub contracts: Vec<Contract>,
|
||
|
|
pub default: Option<Value>,
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. NickelBridge
|
||
|
|
|
||
|
|
CLI integration for Nickel operations in `vapora-shared/src/validation/nickel_bridge.rs`.
|
||
|
|
|
||
|
|
**Operations:**
|
||
|
|
|
||
|
|
- `typecheck(path)` — Validate Nickel syntax
|
||
|
|
- `export(path)` — Export schema as JSON
|
||
|
|
- `query(path, field)` — Query specific field
|
||
|
|
- `is_available()` — Check if Nickel CLI is installed
|
||
|
|
|
||
|
|
**Timeout Protection:** 30s default to prevent DoS from malicious Nickel code.
|
||
|
|
|
||
|
|
## Nickel Schemas
|
||
|
|
|
||
|
|
Located in `schemas/` directory (workspace root).
|
||
|
|
|
||
|
|
### Directory Structure
|
||
|
|
|
||
|
|
```text
|
||
|
|
schemas/
|
||
|
|
├── tools/ # MCP tool parameter validation
|
||
|
|
│ ├── kanban_create_task.ncl
|
||
|
|
│ ├── kanban_update_task.ncl
|
||
|
|
│ ├── assign_task_to_agent.ncl
|
||
|
|
│ ├── get_project_summary.ncl
|
||
|
|
│ └── get_agent_capabilities.ncl
|
||
|
|
└── agents/ # Agent task assignment validation
|
||
|
|
└── task_assignment.ncl
|
||
|
|
```
|
||
|
|
|
||
|
|
### Schema Format
|
||
|
|
|
||
|
|
```nickel
|
||
|
|
{
|
||
|
|
tool_name = "example_tool",
|
||
|
|
|
||
|
|
parameters = {
|
||
|
|
# Required field with contracts
|
||
|
|
user_id
|
||
|
|
| String
|
||
|
|
| doc "User UUID"
|
||
|
|
| std.string.NonEmpty
|
||
|
|
| std.string.match "^[0-9a-f]{8}-[0-9a-f]{4}-...$",
|
||
|
|
|
||
|
|
# Optional field with default
|
||
|
|
priority
|
||
|
|
| Number
|
||
|
|
| doc "Priority score (0-100)"
|
||
|
|
| std.number.between 0 100
|
||
|
|
| default = 50,
|
||
|
|
},
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Supported Contracts
|
||
|
|
|
||
|
|
| Contract | Description | Example |
|
||
|
|
|----------|-------------|---------|
|
||
|
|
| `std.string.NonEmpty` | String cannot be empty | Required text fields |
|
||
|
|
| `std.string.length.min N` | Minimum length | `min 3` for titles |
|
||
|
|
| `std.string.length.max N` | Maximum length | `max 200` for titles |
|
||
|
|
| `std.string.match PATTERN` | Regex validation | UUID format |
|
||
|
|
| `std.number.between A B` | Numeric range | `between 0 100` |
|
||
|
|
| `std.number.greater_than N` | Minimum value (exclusive) | `> -1` |
|
||
|
|
| `std.number.less_than N` | Maximum value (exclusive) | `< 1000` |
|
||
|
|
| `std.enum.TaggedUnion` | Enum validation | `[| 'low, 'high |]` |
|
||
|
|
|
||
|
|
## Integration Points
|
||
|
|
|
||
|
|
### MCP Server
|
||
|
|
|
||
|
|
Location: `crates/vapora-mcp-server/src/main.rs`
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// Initialize validation pipeline
|
||
|
|
let schema_dir = std::env::var("VAPORA_SCHEMA_DIR")
|
||
|
|
.unwrap_or_else(|_| "schemas".to_string());
|
||
|
|
let registry = Arc::new(SchemaRegistry::new(PathBuf::from(&schema_dir)));
|
||
|
|
let validation = Arc::new(ValidationPipeline::new(registry));
|
||
|
|
|
||
|
|
// Add to AppState
|
||
|
|
#[derive(Clone)]
|
||
|
|
struct AppState {
|
||
|
|
validation: Arc<ValidationPipeline>,
|
||
|
|
}
|
||
|
|
|
||
|
|
// Validate in handler
|
||
|
|
async fn invoke_tool(
|
||
|
|
State(state): State<AppState>,
|
||
|
|
Json(request): Json<InvokeToolRequest>,
|
||
|
|
) -> impl IntoResponse {
|
||
|
|
let schema_name = format!("tools/{}", request.tool);
|
||
|
|
let validation_result = state
|
||
|
|
.validation
|
||
|
|
.validate(&schema_name, &request.parameters)
|
||
|
|
.await?;
|
||
|
|
|
||
|
|
if !validation_result.valid {
|
||
|
|
return (StatusCode::BAD_REQUEST, Json(validation_errors));
|
||
|
|
}
|
||
|
|
|
||
|
|
// Execute with validated data
|
||
|
|
let validated_params = validation_result.validated_data.unwrap();
|
||
|
|
// ...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Agent Coordinator
|
||
|
|
|
||
|
|
Location: `crates/vapora-agents/src/coordinator.rs`
|
||
|
|
|
||
|
|
```rust
|
||
|
|
pub struct AgentCoordinator {
|
||
|
|
validation: Arc<ValidationPipeline>,
|
||
|
|
// ...
|
||
|
|
}
|
||
|
|
|
||
|
|
impl AgentCoordinator {
|
||
|
|
pub async fn assign_task(
|
||
|
|
&self,
|
||
|
|
role: &str,
|
||
|
|
title: String,
|
||
|
|
description: String,
|
||
|
|
context: String,
|
||
|
|
priority: u32,
|
||
|
|
) -> Result<String, CoordinatorError> {
|
||
|
|
// Validate inputs
|
||
|
|
let input = serde_json::json!({
|
||
|
|
"role": role,
|
||
|
|
"title": &title,
|
||
|
|
"description": &description,
|
||
|
|
"context": &context,
|
||
|
|
"priority": priority,
|
||
|
|
});
|
||
|
|
|
||
|
|
let validation_result = self
|
||
|
|
.validation
|
||
|
|
.validate("agents/task_assignment", &input)
|
||
|
|
.await?;
|
||
|
|
|
||
|
|
if !validation_result.valid {
|
||
|
|
return Err(CoordinatorError::ValidationError(
|
||
|
|
validation_result.errors.join(", ")
|
||
|
|
));
|
||
|
|
}
|
||
|
|
|
||
|
|
// Continue with validated inputs
|
||
|
|
// ...
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Usage Patterns
|
||
|
|
|
||
|
|
### 1. Validating MCP Tool Inputs
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// In MCP server handler
|
||
|
|
let validation_result = state
|
||
|
|
.validation
|
||
|
|
.validate("tools/kanban_create_task", &input)
|
||
|
|
.await?;
|
||
|
|
|
||
|
|
if !validation_result.valid {
|
||
|
|
let errors: Vec<String> = validation_result
|
||
|
|
.errors
|
||
|
|
.iter()
|
||
|
|
.map(|e| e.to_string())
|
||
|
|
.collect();
|
||
|
|
|
||
|
|
return (StatusCode::BAD_REQUEST, Json(json!({
|
||
|
|
"success": false,
|
||
|
|
"validation_errors": errors,
|
||
|
|
})));
|
||
|
|
}
|
||
|
|
|
||
|
|
// Use validated data with defaults applied
|
||
|
|
let validated_data = validation_result.validated_data.unwrap();
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Validating Agent Task Assignments
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// In AgentCoordinator
|
||
|
|
let input = serde_json::json!({
|
||
|
|
"role": role,
|
||
|
|
"title": title,
|
||
|
|
"description": description,
|
||
|
|
"context": context,
|
||
|
|
"priority": priority,
|
||
|
|
});
|
||
|
|
|
||
|
|
let validation_result = self
|
||
|
|
.validation
|
||
|
|
.validate("agents/task_assignment", &input)
|
||
|
|
.await?;
|
||
|
|
|
||
|
|
if !validation_result.valid {
|
||
|
|
warn!("Validation failed: {:?}", validation_result.errors);
|
||
|
|
return Err(CoordinatorError::ValidationError(
|
||
|
|
format!("Invalid input: {}", validation_result.errors.join(", "))
|
||
|
|
));
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Hot Reloading Schemas
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// Invalidate single schema
|
||
|
|
registry.invalidate("tools/kanban_create_task").await;
|
||
|
|
|
||
|
|
// Invalidate all schemas (useful for config reload)
|
||
|
|
registry.invalidate_all().await;
|
||
|
|
```
|
||
|
|
|
||
|
|
## Testing Schemas
|
||
|
|
|
||
|
|
### Validate Syntax
|
||
|
|
|
||
|
|
```bash
|
||
|
|
nickel typecheck schemas/tools/kanban_create_task.ncl
|
||
|
|
```
|
||
|
|
|
||
|
|
### Export as JSON
|
||
|
|
|
||
|
|
```bash
|
||
|
|
nickel export schemas/tools/kanban_create_task.ncl
|
||
|
|
```
|
||
|
|
|
||
|
|
### Query Specific Field
|
||
|
|
|
||
|
|
```bash
|
||
|
|
nickel query --field parameters.title schemas/tools/kanban_create_task.ncl
|
||
|
|
```
|
||
|
|
|
||
|
|
## Adding New Schemas
|
||
|
|
|
||
|
|
1. Create `.ncl` file in appropriate directory (`tools/` or `agents/`)
|
||
|
|
2. Define `tool_name` or `schema_name`
|
||
|
|
3. Define `parameters` or `fields` with types and contracts
|
||
|
|
4. Add `doc` annotations for documentation
|
||
|
|
5. Test with `nickel typecheck`
|
||
|
|
6. Restart services or use hot-reload
|
||
|
|
|
||
|
|
**Example:**
|
||
|
|
|
||
|
|
```nickel
|
||
|
|
# schemas/tools/my_new_tool.ncl
|
||
|
|
{
|
||
|
|
tool_name = "my_new_tool",
|
||
|
|
|
||
|
|
parameters = {
|
||
|
|
name
|
||
|
|
| String
|
||
|
|
| doc "User name (3-50 chars)"
|
||
|
|
| std.string.NonEmpty
|
||
|
|
| std.string.length.min 3
|
||
|
|
| std.string.length.max 50,
|
||
|
|
|
||
|
|
age
|
||
|
|
| Number
|
||
|
|
| doc "User age (0-120)"
|
||
|
|
| std.number.between 0 120,
|
||
|
|
|
||
|
|
email
|
||
|
|
| String
|
||
|
|
| doc "User email address"
|
||
|
|
| std.string.Email,
|
||
|
|
|
||
|
|
active
|
||
|
|
| Bool
|
||
|
|
| doc "Account status"
|
||
|
|
| default = true,
|
||
|
|
},
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
### Environment Variables
|
||
|
|
|
||
|
|
- `VAPORA_SCHEMA_DIR` — Schema directory path (default: `"schemas"`)
|
||
|
|
|
||
|
|
**In tests:**
|
||
|
|
|
||
|
|
```rust
|
||
|
|
std::env::set_var("VAPORA_SCHEMA_DIR", "../../schemas");
|
||
|
|
```
|
||
|
|
|
||
|
|
**In production:**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
export VAPORA_SCHEMA_DIR=/app/schemas
|
||
|
|
```
|
||
|
|
|
||
|
|
## Performance Characteristics
|
||
|
|
|
||
|
|
- **Schema Loading:** ~5-10ms (first load, then cached)
|
||
|
|
- **Validation:** ~0.1-0.5ms per request (in-memory)
|
||
|
|
- **Hot Reload:** ~10-20ms (invalidates cache, reloads from disk)
|
||
|
|
|
||
|
|
**Optimization:** SchemaRegistry uses `Arc<RwLock<HashMap>>` for concurrent reads.
|
||
|
|
|
||
|
|
## Security Considerations
|
||
|
|
|
||
|
|
### Timeout Protection
|
||
|
|
|
||
|
|
NickelBridge enforces 30s timeout on all CLI operations to prevent:
|
||
|
|
|
||
|
|
- Infinite loops in malicious Nickel code
|
||
|
|
- DoS attacks via crafted schemas
|
||
|
|
- Resource exhaustion
|
||
|
|
|
||
|
|
### Input Sanitization
|
||
|
|
|
||
|
|
Contracts prevent:
|
||
|
|
|
||
|
|
- SQL injection (via UUID/Email validation)
|
||
|
|
- XSS attacks (via length limits on text fields)
|
||
|
|
- Buffer overflows (via max length constraints)
|
||
|
|
- Type confusion (via strict type checking)
|
||
|
|
|
||
|
|
### Schema Validation
|
||
|
|
|
||
|
|
All schemas must pass `nickel typecheck` before deployment.
|
||
|
|
|
||
|
|
## Error Handling
|
||
|
|
|
||
|
|
### ValidationResult
|
||
|
|
|
||
|
|
```rust
|
||
|
|
pub struct ValidationResult {
|
||
|
|
pub valid: bool,
|
||
|
|
pub errors: Vec<ValidationError>,
|
||
|
|
pub validated_data: Option<Value>,
|
||
|
|
}
|
||
|
|
|
||
|
|
pub enum ValidationError {
|
||
|
|
MissingField(String),
|
||
|
|
TypeMismatch { field: String, expected: String, got: String },
|
||
|
|
ContractViolation { field: String, contract: String, value: String },
|
||
|
|
InvalidSchema(String),
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Error Response Format
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"success": false,
|
||
|
|
"error": "Validation failed",
|
||
|
|
"validation_errors": [
|
||
|
|
"Field 'project_id' must match UUID pattern",
|
||
|
|
"Field 'title' must be at least 3 characters",
|
||
|
|
"Field 'priority' must be between 0 and 100"
|
||
|
|
]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Schema Not Found
|
||
|
|
|
||
|
|
**Error:** `Schema file not found: schemas/tools/my_tool.ncl`
|
||
|
|
|
||
|
|
**Solution:** Check `VAPORA_SCHEMA_DIR` environment variable and ensure schema file exists.
|
||
|
|
|
||
|
|
### Nickel CLI Not Available
|
||
|
|
|
||
|
|
**Error:** `Nickel CLI not found in PATH`
|
||
|
|
|
||
|
|
**Solution:** Install Nickel CLI:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cargo install nickel-lang-cli
|
||
|
|
```
|
||
|
|
|
||
|
|
### Validation Always Fails
|
||
|
|
|
||
|
|
**Error:** All requests rejected with validation errors
|
||
|
|
|
||
|
|
**Solution:** Check schema syntax with `nickel typecheck`, verify field names match exactly.
|
||
|
|
|
||
|
|
## Future Enhancements
|
||
|
|
|
||
|
|
- [ ] Remote schema loading (HTTP/S3)
|
||
|
|
- [ ] Schema versioning and migration
|
||
|
|
- [ ] Custom contract plugins
|
||
|
|
- [ ] GraphQL schema generation from Nickel
|
||
|
|
- [ ] OpenAPI spec generation
|
||
|
|
|
||
|
|
## Related Documentation
|
||
|
|
|
||
|
|
- [Nickel Language Documentation](https://nickel-lang.org/)
|
||
|
|
- [VAPORA Architecture Overview](vapora-architecture.md)
|
||
|
|
- [Agent Coordination](agent-registry-coordination.md)
|
||
|
|
- [MCP Protocol Integration](../integrations/mcp-server.md)
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- **Implementation:** `crates/vapora-shared/src/validation/`
|
||
|
|
- **Schemas:** `schemas/`
|
||
|
|
- **Tests:** `crates/vapora-shared/tests/validation_integration.rs`
|
||
|
|
- **MCP Integration:** `crates/vapora-mcp-server/src/main.rs`
|
||
|
|
- **Agent Integration:** `crates/vapora-agents/src/coordinator.rs`
|