prvng_platform/crates/orchestrator/docs/STORAGE_BACKENDS.md
Jesús Pérez 09a97ac8f5
chore: update platform submodule to monorepo crates structure
Platform restructured into crates/, added AI service and detector,
       migrated control-center-ui to Leptos 0.8
2026-01-08 21:32:59 +00:00

11 KiB

Storage Backends Guide

This document provides comprehensive guidance on the orchestrator's storage backend options, configuration, and migration between them.

Overview

The orchestrator supports three storage backends through a pluggable architecture:

  1. Filesystem - JSON file-based storage (default)
  2. SurrealDB Embedded - Local database with RocksDB engine
  3. SurrealDB Server - Remote SurrealDB server connection

All backends implement the same TaskStorage trait, ensuring consistent behavior and seamless migration.

Backend Comparison

Feature Filesystem SurrealDB Embedded SurrealDB Server
Setup Complexity Minimal Low Medium
External Dependencies None None SurrealDB Server
Storage Format JSON Files RocksDB Remote DB
ACID Transactions No Yes Yes
Authentication/RBAC Basic Advanced Advanced
Real-time Subscriptions No Yes Yes
Audit Logging Manual Automatic Automatic
Metrics Collection Basic Advanced Advanced
Task Dependencies Simple Graph-based Graph-based
Horizontal Scaling No No Yes
Backup/Recovery File Copy Database Backup Server Backup
Performance Good Excellent Variable
Memory Usage Low Medium Low
Disk Usage Medium Optimized Minimal

1. Filesystem Backend

Overview

The default storage backend using JSON files for task persistence. Ideal for development and simple deployments.

Configuration

# Default configuration
./orchestrator --storage-type filesystem --data-dir ./data

# Custom data directory
./orchestrator --storage-type filesystem --data-dir /var/lib/orchestrator
```plaintext

### File Structure

```plaintext
data/
└── queue.rkvs/
    ├── tasks/
    │   ├── uuid1.json    # Individual task records
    │   ├── uuid2.json
    │   └── ...
    └── queue/
        ├── uuid1.json    # Queue entries with priority
        ├── uuid2.json
        └── ...
```plaintext

### Features

- ✅ **Simple Setup**: No external dependencies
- ✅ **Transparency**: Human-readable JSON files
- ✅ **Backup**: Standard file system tools
- ✅ **Debugging**: Direct file inspection
- ❌ **ACID**: No transaction guarantees
- ❌ **Concurrency**: Basic file locking
- ❌ **Advanced Features**: Limited auth/audit

### Best Use Cases

- Development environments
- Single-instance deployments
- Simple task orchestration
- Environments with strict dependency requirements

## 2. SurrealDB Embedded

### Overview

Local SurrealDB database using RocksDB storage engine. Provides advanced database features without external dependencies.

### Configuration

```bash
# Build with SurrealDB support
cargo build --features surrealdb

# Run with embedded SurrealDB
./orchestrator --storage-type surrealdb-embedded --data-dir ./data
```plaintext

### Database Schema

- **tasks**: Main task records with full metadata
- **task_queue**: Priority queue with scheduling info
- **users**: Authentication and RBAC
- **audit_log**: Complete operation history
- **metrics**: Performance and usage statistics
- **task_events**: Real-time event stream

### Features

- ✅ **ACID Transactions**: Reliable data consistency
- ✅ **Advanced Queries**: SQL-like syntax with graph support
- ✅ **Real-time Events**: Live query subscriptions
- ✅ **Built-in Auth**: User management and RBAC
- ✅ **Audit Logging**: Automatic operation tracking
- ✅ **No External Deps**: Self-contained database
- ❌ **Horizontal Scaling**: Single-node only

### Configuration Options

```bash
# Custom database location
./orchestrator --storage-type surrealdb-embedded \
  --data-dir /var/lib/orchestrator/db

# With specific namespace/database
./orchestrator --storage-type surrealdb-embedded \
  --data-dir ./data \
  --surrealdb-namespace production \
  --surrealdb-database orchestrator
```plaintext

### Best Use Cases

- Production single-node deployments
- Applications requiring ACID guarantees
- Advanced querying and analytics
- Real-time monitoring requirements
- Audit logging compliance

## 3. SurrealDB Server

### Overview

Remote SurrealDB server connection providing full distributed database capabilities with horizontal scaling.

### Prerequisites

1. **SurrealDB Server**: Running instance accessible via network
2. **Authentication**: Valid credentials for database access
3. **Network**: Reliable connectivity to SurrealDB server

### SurrealDB Server Setup

```bash
# Install SurrealDB
curl -sSf https://install.surrealdb.com | sh

# Start server
surreal start --log trace --user root --pass root memory

# Or with file storage
surreal start --log trace --user root --pass root file:orchestrator.db

# Or with TiKV (distributed)
surreal start --log trace --user root --pass root tikv://localhost:2379
```plaintext

### Configuration

```bash
# Basic server connection
./orchestrator --storage-type surrealdb-server \
  --surrealdb-url ws://localhost:8000 \
  --surrealdb-username admin \
  --surrealdb-password secret

# Production configuration
./orchestrator --storage-type surrealdb-server \
  --surrealdb-url wss://surreal.production.com:8000 \
  --surrealdb-namespace prod \
  --surrealdb-database orchestrator \
  --surrealdb-username orchestrator-service \
  --surrealdb-password "$SURREALDB_PASSWORD"
```plaintext

### Features

- ✅ **Distributed**: Multi-node clustering support
- ✅ **Horizontal Scaling**: Handle massive workloads
- ✅ **Multi-tenancy**: Namespace and database isolation
- ✅ **Real-time Collaboration**: Multiple orchestrator instances
- ✅ **Advanced Security**: Enterprise authentication
- ✅ **High Availability**: Fault-tolerant deployments
- ❌ **Complexity**: Requires server management
- ❌ **Network Dependency**: Requires reliable connectivity

### Best Use Cases

- Distributed production deployments
- Multiple orchestrator instances
- High availability requirements
- Large-scale task orchestration
- Multi-tenant environments

## Migration Between Backends

### Migration Tool

Use the migration script to move data between any backend combination:

```bash
# Interactive migration wizard
./scripts/migrate-storage.nu --interactive

# Direct migration examples
./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded \
  --source-dir ./data --target-dir ./surrealdb-data

./scripts/migrate-storage.nu --from surrealdb-embedded --to surrealdb-server \
  --source-dir ./data \
  --surrealdb-url ws://localhost:8000 \
  --username admin --password secret

# Validation and dry-run
./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-embedded
./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded --dry-run
```plaintext

### Migration Features

- **Data Integrity**: Complete validation before and after migration
- **Progress Tracking**: Real-time progress with throughput metrics
- **Rollback Support**: Automatic rollback on failures
- **Selective Migration**: Filter by task status, date range, etc.
- **Batch Processing**: Configurable batch sizes for performance

### Migration Scenarios

#### Development to Production

```bash
# Migrate from filesystem (dev) to SurrealDB embedded (production)
./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded \
  --source-dir ./dev-data --target-dir ./prod-data \
  --batch-size 100 --verify
```plaintext

#### Scaling Up

```bash
# Migrate from embedded to server for distributed setup
./scripts/migrate-storage.nu --from surrealdb-embedded --to surrealdb-server \
  --source-dir ./data \
  --surrealdb-url ws://production-surreal:8000 \
  --username orchestrator --password "$PROD_PASSWORD" \
  --namespace production --database main
```plaintext

#### Disaster Recovery

```bash
# Migrate from server back to filesystem for emergency backup
./scripts/migrate-storage.nu --from surrealdb-server --to filesystem \
  --surrealdb-url ws://failing-server:8000 \
  --username admin --password "$PASSWORD" \
  --target-dir ./emergency-backup
```plaintext

## Performance Considerations

### Filesystem

- **Strengths**: Low memory usage, simple debugging
- **Limitations**: File I/O bottlenecks, no concurrent writes
- **Optimization**: Fast SSD, regular cleanup of old tasks

### SurrealDB Embedded

- **Strengths**: Excellent single-node performance, ACID guarantees
- **Limitations**: Memory usage scales with data size
- **Optimization**: Adequate RAM, SSD storage, regular compaction

### SurrealDB Server

- **Strengths**: Horizontal scaling, shared state
- **Limitations**: Network latency, server dependency
- **Optimization**: Low-latency network, connection pooling, server tuning

## Security Considerations

### Filesystem

- **File Permissions**: Restrict access to data directory
- **Backup Security**: Encrypt backup files
- **Network**: No network exposure

### SurrealDB Embedded

- **File Permissions**: Secure database files
- **Encryption**: Database-level encryption available
- **Access Control**: Built-in user management

### SurrealDB Server

- **Network Security**: Use TLS/WSS connections
- **Authentication**: Strong passwords, regular rotation
- **Authorization**: Role-based access control
- **Audit**: Complete operation logging

## Troubleshooting

### Common Issues

#### Filesystem Backend

```bash
# Permission issues
sudo chown -R $USER:$USER ./data
chmod -R 755 ./data

# Corrupted JSON files
rm ./data/queue.rkvs/tasks/corrupted-file.json
```plaintext

#### SurrealDB Embedded

```bash
# Database corruption
rm -rf ./data/orchestrator.db
# Restore from backup or re-initialize

# Permission issues
sudo chown -R $USER:$USER ./data
```plaintext

#### SurrealDB Server

```bash
# Connection issues
telnet surreal-server 8000
# Check server status and network connectivity

# Authentication failures
# Verify credentials and user permissions
```plaintext

### Debugging Commands

```bash
# List available storage types
./orchestrator --help | grep storage-type

# Validate configuration
./orchestrator --storage-type filesystem --data-dir ./data --dry-run

# Test migration
./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-embedded

# Monitor migration progress
./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded --verbose
```plaintext

## Recommendations

### Development

- **Use**: Filesystem backend
- **Rationale**: Simple setup, easy debugging, no external dependencies

### Single-Node Production

- **Use**: SurrealDB Embedded
- **Rationale**: ACID guarantees, advanced features, no external dependencies

### Distributed Production

- **Use**: SurrealDB Server
- **Rationale**: Horizontal scaling, high availability, multi-instance support

### Migration Path

1. **Start**: Filesystem (development)
2. **Scale**: SurrealDB Embedded (single-node production)
3. **Distribute**: SurrealDB Server (multi-node production)

This progressive approach allows teams to start simple and scale as requirements grow, with seamless migration between each stage.