# Storage Backends Guide This document provides comprehensive guidance on the orchestrator's storage backend options, configuration, and migration between them. ## Overview The orchestrator supports three storage backends through a pluggable architecture: 1. **Filesystem** - JSON file-based storage (default) 2. **SurrealDB Embedded** - Local database with RocksDB engine 3. **SurrealDB Server** - Remote SurrealDB server connection All backends implement the same `TaskStorage` trait, ensuring consistent behavior and seamless migration. ## Backend Comparison | Feature | Filesystem | SurrealDB Embedded | SurrealDB Server | |---------|------------|-------------------|------------------| | **Setup Complexity** | Minimal | Low | Medium | | **External Dependencies** | None | None | SurrealDB Server | | **Storage Format** | JSON Files | RocksDB | Remote DB | | **ACID Transactions** | No | Yes | Yes | | **Authentication/RBAC** | Basic | Advanced | Advanced | | **Real-time Subscriptions** | No | Yes | Yes | | **Audit Logging** | Manual | Automatic | Automatic | | **Metrics Collection** | Basic | Advanced | Advanced | | **Task Dependencies** | Simple | Graph-based | Graph-based | | **Horizontal Scaling** | No | No | Yes | | **Backup/Recovery** | File Copy | Database Backup | Server Backup | | **Performance** | Good | Excellent | Variable | | **Memory Usage** | Low | Medium | Low | | **Disk Usage** | Medium | Optimized | Minimal | ## 1. Filesystem Backend ### Overview The default storage backend using JSON files for task persistence. Ideal for development and simple deployments. ### Configuration ```bash # Default configuration ./orchestrator --storage-type filesystem --data-dir ./data # Custom data directory ./orchestrator --storage-type filesystem --data-dir /var/lib/orchestrator ``` ### File Structure ``` data/ └── queue.rkvs/ ├── tasks/ │ ├── uuid1.json # Individual task records │ ├── uuid2.json │ └── ... └── queue/ ├── uuid1.json # Queue entries with priority ├── uuid2.json └── ... ``` ### Features - ✅ **Simple Setup**: No external dependencies - ✅ **Transparency**: Human-readable JSON files - ✅ **Backup**: Standard file system tools - ✅ **Debugging**: Direct file inspection - ❌ **ACID**: No transaction guarantees - ❌ **Concurrency**: Basic file locking - ❌ **Advanced Features**: Limited auth/audit ### Best Use Cases - Development environments - Single-instance deployments - Simple task orchestration - Environments with strict dependency requirements ## 2. SurrealDB Embedded ### Overview Local SurrealDB database using RocksDB storage engine. Provides advanced database features without external dependencies. ### Configuration ```bash # Build with SurrealDB support cargo build --features surrealdb # Run with embedded SurrealDB ./orchestrator --storage-type surrealdb-embedded --data-dir ./data ``` ### Database Schema - **tasks**: Main task records with full metadata - **task_queue**: Priority queue with scheduling info - **users**: Authentication and RBAC - **audit_log**: Complete operation history - **metrics**: Performance and usage statistics - **task_events**: Real-time event stream ### Features - ✅ **ACID Transactions**: Reliable data consistency - ✅ **Advanced Queries**: SQL-like syntax with graph support - ✅ **Real-time Events**: Live query subscriptions - ✅ **Built-in Auth**: User management and RBAC - ✅ **Audit Logging**: Automatic operation tracking - ✅ **No External Deps**: Self-contained database - ❌ **Horizontal Scaling**: Single-node only ### Configuration Options ```bash # Custom database location ./orchestrator --storage-type surrealdb-embedded \ --data-dir /var/lib/orchestrator/db # With specific namespace/database ./orchestrator --storage-type surrealdb-embedded \ --data-dir ./data \ --surrealdb-namespace production \ --surrealdb-database orchestrator ``` ### Best Use Cases - Production single-node deployments - Applications requiring ACID guarantees - Advanced querying and analytics - Real-time monitoring requirements - Audit logging compliance ## 3. SurrealDB Server ### Overview Remote SurrealDB server connection providing full distributed database capabilities with horizontal scaling. ### Prerequisites 1. **SurrealDB Server**: Running instance accessible via network 2. **Authentication**: Valid credentials for database access 3. **Network**: Reliable connectivity to SurrealDB server ### SurrealDB Server Setup ```bash # Install SurrealDB curl -sSf https://install.surrealdb.com | sh # Start server surreal start --log trace --user root --pass root memory # Or with file storage surreal start --log trace --user root --pass root file:orchestrator.db # Or with TiKV (distributed) surreal start --log trace --user root --pass root tikv://localhost:2379 ``` ### Configuration ```bash # Basic server connection ./orchestrator --storage-type surrealdb-server \ --surrealdb-url ws://localhost:8000 \ --surrealdb-username admin \ --surrealdb-password secret # Production configuration ./orchestrator --storage-type surrealdb-server \ --surrealdb-url wss://surreal.production.com:8000 \ --surrealdb-namespace prod \ --surrealdb-database orchestrator \ --surrealdb-username orchestrator-service \ --surrealdb-password "$SURREALDB_PASSWORD" ``` ### Features - ✅ **Distributed**: Multi-node clustering support - ✅ **Horizontal Scaling**: Handle massive workloads - ✅ **Multi-tenancy**: Namespace and database isolation - ✅ **Real-time Collaboration**: Multiple orchestrator instances - ✅ **Advanced Security**: Enterprise authentication - ✅ **High Availability**: Fault-tolerant deployments - ❌ **Complexity**: Requires server management - ❌ **Network Dependency**: Requires reliable connectivity ### Best Use Cases - Distributed production deployments - Multiple orchestrator instances - High availability requirements - Large-scale task orchestration - Multi-tenant environments ## Migration Between Backends ### Migration Tool Use the migration script to move data between any backend combination: ```bash # Interactive migration wizard ./scripts/migrate-storage.nu --interactive # Direct migration examples ./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded \ --source-dir ./data --target-dir ./surrealdb-data ./scripts/migrate-storage.nu --from surrealdb-embedded --to surrealdb-server \ --source-dir ./data \ --surrealdb-url ws://localhost:8000 \ --username admin --password secret # Validation and dry-run ./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-embedded ./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded --dry-run ``` ### Migration Features - **Data Integrity**: Complete validation before and after migration - **Progress Tracking**: Real-time progress with throughput metrics - **Rollback Support**: Automatic rollback on failures - **Selective Migration**: Filter by task status, date range, etc. - **Batch Processing**: Configurable batch sizes for performance ### Migration Scenarios #### Development to Production ```bash # Migrate from filesystem (dev) to SurrealDB embedded (production) ./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded \ --source-dir ./dev-data --target-dir ./prod-data \ --batch-size 100 --verify ``` #### Scaling Up ```bash # Migrate from embedded to server for distributed setup ./scripts/migrate-storage.nu --from surrealdb-embedded --to surrealdb-server \ --source-dir ./data \ --surrealdb-url ws://production-surreal:8000 \ --username orchestrator --password "$PROD_PASSWORD" \ --namespace production --database main ``` #### Disaster Recovery ```bash # Migrate from server back to filesystem for emergency backup ./scripts/migrate-storage.nu --from surrealdb-server --to filesystem \ --surrealdb-url ws://failing-server:8000 \ --username admin --password "$PASSWORD" \ --target-dir ./emergency-backup ``` ## Performance Considerations ### Filesystem - **Strengths**: Low memory usage, simple debugging - **Limitations**: File I/O bottlenecks, no concurrent writes - **Optimization**: Fast SSD, regular cleanup of old tasks ### SurrealDB Embedded - **Strengths**: Excellent single-node performance, ACID guarantees - **Limitations**: Memory usage scales with data size - **Optimization**: Adequate RAM, SSD storage, regular compaction ### SurrealDB Server - **Strengths**: Horizontal scaling, shared state - **Limitations**: Network latency, server dependency - **Optimization**: Low-latency network, connection pooling, server tuning ## Security Considerations ### Filesystem - **File Permissions**: Restrict access to data directory - **Backup Security**: Encrypt backup files - **Network**: No network exposure ### SurrealDB Embedded - **File Permissions**: Secure database files - **Encryption**: Database-level encryption available - **Access Control**: Built-in user management ### SurrealDB Server - **Network Security**: Use TLS/WSS connections - **Authentication**: Strong passwords, regular rotation - **Authorization**: Role-based access control - **Audit**: Complete operation logging ## Troubleshooting ### Common Issues #### Filesystem Backend ```bash # Permission issues sudo chown -R $USER:$USER ./data chmod -R 755 ./data # Corrupted JSON files rm ./data/queue.rkvs/tasks/corrupted-file.json ``` #### SurrealDB Embedded ```bash # Database corruption rm -rf ./data/orchestrator.db # Restore from backup or re-initialize # Permission issues sudo chown -R $USER:$USER ./data ``` #### SurrealDB Server ```bash # Connection issues telnet surreal-server 8000 # Check server status and network connectivity # Authentication failures # Verify credentials and user permissions ``` ### Debugging Commands ```bash # List available storage types ./orchestrator --help | grep storage-type # Validate configuration ./orchestrator --storage-type filesystem --data-dir ./data --dry-run # Test migration ./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-embedded # Monitor migration progress ./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded --verbose ``` ## Recommendations ### Development - **Use**: Filesystem backend - **Rationale**: Simple setup, easy debugging, no external dependencies ### Single-Node Production - **Use**: SurrealDB Embedded - **Rationale**: ACID guarantees, advanced features, no external dependencies ### Distributed Production - **Use**: SurrealDB Server - **Rationale**: Horizontal scaling, high availability, multi-instance support ### Migration Path 1. **Start**: Filesystem (development) 2. **Scale**: SurrealDB Embedded (single-node production) 3. **Distribute**: SurrealDB Server (multi-node production) This progressive approach allows teams to start simple and scale as requirements grow, with seamless migration between each stage.