From af85d60ddd6ab4f0c347ba4d31a0173547f31956 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jesu=CC=81s=20Pe=CC=81rez?= Date: Wed, 14 Jan 2026 05:24:19 +0000 Subject: [PATCH] chore: fix fences errors --- .typedialog/README.md | 372 ++++++++- config/README.md | 110 ++- config/examples/README.md | 202 ++++- crates/control-center-ui/README.md | 369 ++++++++- crates/control-center-ui/REFERENCE.md | 34 +- crates/control-center-ui/auth-system.md | 382 ++++++++- .../upstream-dependency-issue.md | 146 +++- crates/control-center/README.md | 372 ++++++++- .../docs/security-considerations.md | 711 +++++++++++++++- crates/control-center/src/kms/README.md | 454 ++++++++++- crates/control-center/web/README.md | 181 ++++- crates/extension-registry/API.md | 587 +++++++++++++- crates/extension-registry/README.md | 636 ++++++++++++++- crates/mcp-server/README.md | 136 +++- crates/orchestrator/README.md | 516 +++++++++++- crates/orchestrator/docs/dns-integration.md | 222 ++++- crates/orchestrator/docs/extension-loading.md | 377 ++++++++- crates/orchestrator/docs/oci-integration.md | 418 +++++++++- .../docs/service-orchestration.md | 468 ++++++++++- .../orchestrator/docs/ssh-key-management.md | 526 +++++++++++- crates/orchestrator/docs/storage-backends.md | 386 ++++++++- crates/orchestrator/wrks/readme-testing.md | 393 ++++++++- crates/vault-service/README.md | 468 ++++++++++- docs/deployment/deployment-guide.md | 758 +++++++++++++++++- docs/deployment/guide.md | 469 ++++++++++- docs/deployment/known-issues.md | 98 ++- docs/guides/quick-start.md | 283 ++++++- 27 files changed, 10047 insertions(+), 27 deletions(-) diff --git a/.typedialog/README.md b/.typedialog/README.md index 0479c8a..6c84987 100644 --- a/.typedialog/README.md +++ b/.typedialog/README.md @@ -1 +1,371 @@ -# TypeDialog Integration\n\nTypeDialog enables interactive form-based configuration from Nickel schemas.\n\n## Status\n\n- **TypeDialog Binary**: Not yet installed (planned: `typedialog` command)\n- **TypeDialog Forms**: Created and ready (setup wizard, auth login, MFA enrollment)\n- **Bash Wrappers**: Implemented to handle TTY input properly\n- **ForminQuire**: DEPRECATED - Archived to `.coder/archive/forminquire/`\n\n## Directory Structure\n\n```{$detected_lang}\n.typedialog/\n└── provisioning/platform/\n ├── README.md # This file\n ├── forms/ # Form definitions (to be generated)\n │ ├── orchestrator.form.toml\n │ ├── control-center.form.toml\n │ └── ...\n ├── templates/ # Jinja2 templates for schema rendering\n │ └── service-form.template.j2\n ├── schemas/ # Symlink to Nickel schemas\n │ └── platform/schemas/ → ../../../schemas/platform/schemas/\n └── constraints/ # Validation constraints\n └── constraints.toml # Shared validation rules\n```\n\n## How TypeDialog Would Work\n\n### 1. Form Generation from Schemas\n\n```{$detected_lang}\n# Auto-generate form from Nickel schema\ntypedialog generate-form --schema orchestrator.ncl \\n --output forms/orchestrator.form.toml\n```\n\n### 2. Interactive Configuration\n\n```{$detected_lang}\n# Run interactive form\ntypedialog run-form --form forms/orchestrator.form.toml \\n --output orchestrator-configured.ncl\n```\n\n### 3. Validation\n\n```{$detected_lang}\n# Validate user input against schema\ntypedialog validate --form forms/orchestrator.form.toml \\n --data user-config.ncl\n```\n\n## Current Status: TypeDialog Forms Ready\n\nTypeDialog forms have been created and are ready to use:\n\n**Form Locations**:\n- Setup wizard: `provisioning/.typedialog/core/forms/setup-wizard.toml`\n- Auth login: `provisioning/.typedialog/core/forms/auth-login.toml`\n- MFA enrollment: `provisioning/.typedialog/core/forms/mfa-enroll.toml`\n\n**Bash Wrappers** (TTY-safe, handle input properly):\n- `provisioning/core/shlib/setup-wizard-tty.sh`\n- `provisioning/core/shlib/auth-login-tty.sh`\n- `provisioning/core/shlib/mfa-enroll-tty.sh`\n\n**Usage Pattern**:\n1. Bash wrapper calls TypeDialog (handles TTY input)\n2. TypeDialog generates Nickel config file\n3. Nushell scripts read the generated config (no input issues)\n\n**Example**:\n\n```{$detected_lang}\n# Run TypeDialog setup wizard\n./provisioning/core/shlib/setup-wizard-tty.sh\n\n# Nushell reads the generated config\nlet config = (open provisioning/.typedialog/core/generated/setup-wizard-result.json | from json)\n```\n\n**Note**: ForminQuire (Jinja2-based forms) has been archived to `provisioning/.coder/archive/forminquire/` and is no longer in use.\n\n## Integration Plan (When TypeDialog Available)\n\n### Step 1: Install TypeDialog\n\n```{$detected_lang}\ncargo install --path /Users/Akasha/Development/typedialog\ntypedialog --version\n```\n\n### Step 2: Generate Forms from Schemas\n\n```{$detected_lang}\n# Batch generate all forms\nfor schema in provisioning/schemas/platform/schemas/*.ncl; do\n service=$(basename $schema .ncl)\n typedialog generate-form \\n --schema $schema \\n --output provisioning/platform/.typedialog/forms/${service}.form.toml\ndone\n```\n\n### Step 3: Create Setup Wizard\n\n```{$detected_lang}\n# Unified setup workflow\nprovisioning setup-platform \\n --mode solo|multiuser|enterprise \\n --provider docker|kubernetes \\n --interactive # Uses TypeDialog forms\n```\n\n### Step 4: Update Platform Setup Script\n\n```{$detected_lang}\n# provisioning/platform/scripts/setup-platform-config.sh\n\nif command -v typedialog &> /dev/null; then\n # TypeDialog is installed - use bash wrapper for proper TTY handling\n ./provisioning/core/shlib/setup-wizard-tty.sh\n\n # Read generated JSON config\n # Nushell scripts can now read the config without input issues\nelse\n # Fallback to basic prompts\n echo "TypeDialog not available. Using basic interactive prompts..."\n # Nushell wizard with basic input prompts\n nu -c "use provisioning/core/nulib/lib_provisioning/setup/wizard.nu *; run-setup-wizard"\nfi\n```\n\n## Form Definition Example\n\n```{$detected_lang}\n# provisioning/platform/.typedialog/forms/orchestrator.form.toml\n[metadata]\nname = "Orchestrator Configuration"\ndescription = "Configure the Orchestrator service"\nversion = "1.0.0"\nschema = "orchestrator.ncl"\n\n[fields.mode]\ntype = "enum"\nlabel = "Deployment Mode"\ndescription = "Select deployment mode: solo, multiuser, or enterprise"\noptions = ["solo", "multiuser", "enterprise"]\ndefault = "solo"\nrequired = true\n\n[fields.server.port]\ntype = "number"\nlabel = "Server Port"\ndescription = "HTTP server port (1-65535)"\nmin = 1\nmax = 65535\ndefault = 8080\nrequired = true\n\n[fields.database.host]\ntype = "string"\nlabel = "Database Host"\ndescription = "PostgreSQL host"\ndefault = "localhost"\nrequired = true\n\n[fields.logging.level]\ntype = "enum"\nlabel = "Logging Level"\noptions = ["debug", "info", "warning", "error"]\ndefault = "info"\nrequired = false\n```\n\n## Validation Constraints\n\n```{$detected_lang}\n# provisioning/platform/.typedialog/constraints/constraints.toml\n\n[orchestrator]\nmode = ["solo", "multiuser", "enterprise"]\nport = "range(1, 65535)"\ndatabase_pool_size = "range(1, 100)"\nmemory = "pattern(^\\d+[MG]B$)"\n\n[control-center]\nport = "range(1, 65535)"\nreplicas = "range(1, 10)"\n\n[nginx]\nworker_processes = "range(1, 32)"\nworker_connections = "range(1, 65536)"\n```\n\n## Workflow: Setup to Deployment\n\n```{$detected_lang}\n1. User runs setup command\n ↓\n2. TypeDialog displays form\n ↓\n3. User fills form with validation\n ↓\n4. Form data → Nickel config\n ↓\n5. Nickel config → TOML (via ConfigLoader)\n ↓\n6. Service reads TOML config\n ↓\n7. Service starts with configured values\n```\n\n## Benefits of TypeDialog Integration\n\n- ✅ **Type-safe forms** - Generated from Nickel schemas\n- ✅ **Real-time validation** - Enforce constraints as user types\n- ✅ **Progressive disclosure** - Show advanced options only when needed\n- ✅ **Consistent UX** - Same forms across platforms (CLI, Web, TUI)\n- ✅ **Auto-generated** - Forms stay in sync with schemas automatically\n- ✅ **TTY handling** - Bash wrappers solve Nushell input stack issues\n- ✅ **Graceful fallback** - Falls back to basic prompts if TypeDialog unavailable\n\n## Testing TypeDialog Forms\n\n```{$detected_lang}\n# Validate form structure\ntypedialog check-form provisioning/platform/.typedialog/forms/orchestrator.form.toml\n\n# Run form with test data\ntypedialog run-form \\n --form provisioning/platform/.typedialog/forms/orchestrator.form.toml \\n --test-mode # Automated validation\n\n# Generate sample output\ntypedialog generate-sample \\n --form provisioning/platform/.typedialog/forms/orchestrator.form.toml \\n --output /tmp/orchestrator-sample.ncl\n```\n\n## Migration Path\n\n### Phase A: Legacy (DEPRECATED)\n\n```{$detected_lang}\nFormInquire (Jinja2) → Nushell processing → TOML config\nStatus: ARCHIVED to .coder/archive/forminquire/\n```\n\n### Phase B: Current Implementation\n\n```{$detected_lang}\nBash wrapper → TypeDialog (TTY input) → Nickel config → JSON export → Nushell reads JSON\nStatus: IMPLEMENTED with forms ready\n```\n\n### Phase C: TypeDialog Binary Available (Future)\n\n```{$detected_lang}\nTypeDialog binary installed → Full nickel-roundtrip workflow → Auto-sync with schemas\nStatus: PLANNED - awaiting TypeDialog binary release\n```\n\n### Phase D: Unified (Future)\n\n```{$detected_lang}\nConfigLoader discovers config → Service reads → TypeDialog updates UI\n```\n\n## Integration with Infrastructure Schemas\n\nTypeDialog forms work seamlessly with infrastructure schemas:\n\n### Infrastructure Configuration Workflow\n\n**1. Define Infrastructure Schemas** (completed)\n- Location: `provisioning/schemas/infrastructure/`\n- 6 schemas: docker-compose, kubernetes, nginx, prometheus, systemd, oci-registry\n- All validated with `nickel typecheck`\n\n**2. Generate Infrastructure Configs** (completed)\n- Script: `provisioning/platform/scripts/generate-infrastructure-configs.nu`\n- Supports: solo, multiuser, enterprise, cicd modes\n- Formats: YAML, JSON, conf, service\n\n**3. Validate Generated Configs** (completed)\n- Script: `provisioning/platform/scripts/validate-infrastructure.nu`\n- Tools: docker-compose config, kubectl apply --dry-run, nginx -t, promtool check\n- Examples: `examples-solo-deployment.ncl`, `examples-enterprise-deployment.ncl`\n\n**4. Interactive Setup with Forms** (TypeDialog ready)\n- Script: `provisioning/platform/scripts/setup-with-forms.sh`\n- Bash wrappers: `provisioning/core/shlib/*-tty.sh` (handle TTY input)\n- Forms ready: setup-wizard, auth-login, mfa-enroll\n- Fallback: Basic Nushell prompts if TypeDialog unavailable\n\n### Current Status: Full Infrastructure Support\n\n| Component | Status | Details |\n| ----------- | -------- | --------- |\n| **Schemas** | ✅ Complete | 6 infrastructure schemas (1,577 lines) |\n| **Examples** | ✅ Complete | 2 deployment examples (solo, enterprise) |\n| **Generation Script** | ✅ Complete | Auto-generates configs for all modes |\n| **Validation Script** | ✅ Complete | Validates Docker, K8s, Nginx, Prometheus |\n| **Setup Wizard** | ✅ Complete | TypeDialog forms + bash wrappers ready |\n| **TypeDialog Integration** | ⏳ Pending | Structure ready, awaiting binary |\n\n### Validated Examples\n\n**Solo Deployment** (`examples-solo-deployment.ncl`):\n- ✅ Type-checks without errors\n- ✅ Exports to 198 lines of JSON\n- ✅ 5 Docker Compose services\n- ✅ Resource limits: 1.0-4.0 CPU, 256M-1024M RAM\n- ✅ Prometheus: 4 scrape jobs\n- ✅ Registry backend: Zot (filesystem)\n\n**Enterprise Deployment** (`examples-enterprise-deployment.ncl`):\n- ✅ Type-checks without errors\n- ✅ Exports to 313 lines of JSON\n- ✅ 6 Docker Compose services with HA\n- ✅ Resource limits: 2.0-4.0 CPU, 512M-4096M RAM\n- ✅ Prometheus: 7 scrape jobs with remote storage\n- ✅ Registry backend: Harbor (S3 distributed)\n\n### Test Infrastructure Generation\n\n```{$detected_lang}\n# Export solo infrastructure\nnickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl > /tmp/solo.json\n\n# Validate JSON\njq . /tmp/solo.json\n\n# Check Docker Compose services\njq '.docker_compose_services | keys' /tmp/solo.json\n\n# Compare resource allocation (solo vs enterprise)\njq '.docker_compose_services.orchestrator.deploy.resources.limits' /tmp/solo.json\njq '.docker_compose_services.orchestrator.deploy.resources.limits' /tmp/enterprise.json\n```\n\n## Next Steps\n\n1. **Infrastructure Setup** (available now):\n - Generate infrastructure configs with automation scripts\n - Validate with format-specific tools\n - Use interactive setup wizard for configuration\n\n2. **When TypeDialog binary becomes available**:\n - Install TypeDialog binary\n - Forms already created and ready to use\n - Bash wrappers handle TTY input (no Nushell stack issues)\n - Full nickel-roundtrip workflow will be enabled\n\n3. **Production Deployment**:\n - Use validated infrastructure configs\n - Deploy with ConfigLoader + infrastructure schemas\n - Monitor via Prometheus (auto-generated from schemas)\n\n---\n\n**Version**: 1.2.0 (TypeDialog Forms + Bash Wrappers)\n**Status**: TypeDialog forms ready with bash wrappers; Awaiting TypeDialog Binary\n**Last Updated**: 2025-01-09\n**ForminQuire Status**: DEPRECATED - Archived to .coder/archive/forminquire/\n**Fallback**: Basic Nushell prompts if TypeDialog unavailable\n**Tested**: Infrastructure examples (solo + enterprise) validated +# TypeDialog Integration + +TypeDialog enables interactive form-based configuration from Nickel schemas. + +## Status + +- **TypeDialog Binary**: Not yet installed (planned: `typedialog` command) +- **TypeDialog Forms**: Created and ready (setup wizard, auth login, MFA enrollment) +- **Bash Wrappers**: Implemented to handle TTY input properly +- **ForminQuire**: DEPRECATED - Archived to `.coder/archive/forminquire/` + +## Directory Structure + +```toml +.typedialog/ +└── provisioning/platform/ + ├── README.md # This file + ├── forms/ # Form definitions (to be generated) + │ ├── orchestrator.form.toml + │ ├── control-center.form.toml + │ └── ... + ├── templates/ # Jinja2 templates for schema rendering + │ └── service-form.template.j2 + ├── schemas/ # Symlink to Nickel schemas + │ └── platform/schemas/ → ../../../schemas/platform/schemas/ + └── constraints/ # Validation constraints + └── constraints.toml # Shared validation rules +``` + +## How TypeDialog Would Work + +### 1. Form Generation from Schemas + +```toml +# Auto-generate form from Nickel schema +typedialog generate-form --schema orchestrator.ncl + --output forms/orchestrator.form.toml +``` + +### 2. Interactive Configuration + +```toml +# Run interactive form +typedialog run-form --form forms/orchestrator.form.toml + --output orchestrator-configured.ncl +``` + +### 3. Validation + +```toml +# Validate user input against schema +typedialog validate --form forms/orchestrator.form.toml + --data user-config.ncl +``` + +## Current Status: TypeDialog Forms Ready + +TypeDialog forms have been created and are ready to use: + +**Form Locations**: +- Setup wizard: `provisioning/.typedialog/core/forms/setup-wizard.toml` +- Auth login: `provisioning/.typedialog/core/forms/auth-login.toml` +- MFA enrollment: `provisioning/.typedialog/core/forms/mfa-enroll.toml` + +**Bash Wrappers** (TTY-safe, handle input properly): +- `provisioning/core/shlib/setup-wizard-tty.sh` +- `provisioning/core/shlib/auth-login-tty.sh` +- `provisioning/core/shlib/mfa-enroll-tty.sh` + +**Usage Pattern**: +1. Bash wrapper calls TypeDialog (handles TTY input) +2. TypeDialog generates Nickel config file +3. Nushell scripts read the generated config (no input issues) + +**Example**: + +```toml +# Run TypeDialog setup wizard +./provisioning/core/shlib/setup-wizard-tty.sh + +# Nushell reads the generated config +let config = (open provisioning/.typedialog/core/generated/setup-wizard-result.json | from json) +``` + +**Note**: ForminQuire (Jinja2-based forms) has been archived to `provisioning/.coder/archive/forminquire/` and is no longer in use. + +## Integration Plan (When TypeDialog Available) + +### Step 1: Install TypeDialog + +```toml +cargo install --path /Users/Akasha/Development/typedialog +typedialog --version +``` + +### Step 2: Generate Forms from Schemas + +```toml +# Batch generate all forms +for schema in provisioning/schemas/platform/schemas/*.ncl; do + service=$(basename $schema .ncl) + typedialog generate-form + --schema $schema + --output provisioning/platform/.typedialog/forms/${service}.form.toml +done +``` + +### Step 3: Create Setup Wizard + +```toml +# Unified setup workflow +provisioning setup-platform + --mode solo|multiuser|enterprise + --provider docker|kubernetes + --interactive # Uses TypeDialog forms +``` + +### Step 4: Update Platform Setup Script + +```toml +# provisioning/platform/scripts/setup-platform-config.sh + +if command -v typedialog &> /dev/null; then + # TypeDialog is installed - use bash wrapper for proper TTY handling + ./provisioning/core/shlib/setup-wizard-tty.sh + + # Read generated JSON config + # Nushell scripts can now read the config without input issues +else + # Fallback to basic prompts + echo "TypeDialog not available. Using basic interactive prompts..." + # Nushell wizard with basic input prompts + nu -c "use provisioning/core/nulib/lib_provisioning/setup/wizard.nu *; run-setup-wizard" +fi +``` + +## Form Definition Example + +```toml +# provisioning/platform/.typedialog/forms/orchestrator.form.toml +[metadata] +name = "Orchestrator Configuration" +description = "Configure the Orchestrator service" +version = "1.0.0" +schema = "orchestrator.ncl" + +[fields.mode] +type = "enum" +label = "Deployment Mode" +description = "Select deployment mode: solo, multiuser, or enterprise" +options = ["solo", "multiuser", "enterprise"] +default = "solo" +required = true + +[fields.server.port] +type = "number" +label = "Server Port" +description = "HTTP server port (1-65535)" +min = 1 +max = 65535 +default = 8080 +required = true + +[fields.database.host] +type = "string" +label = "Database Host" +description = "PostgreSQL host" +default = "localhost" +required = true + +[fields.logging.level] +type = "enum" +label = "Logging Level" +options = ["debug", "info", "warning", "error"] +default = "info" +required = false +``` + +## Validation Constraints + +```toml +# provisioning/platform/.typedialog/constraints/constraints.toml + +[orchestrator] +mode = ["solo", "multiuser", "enterprise"] +port = "range(1, 65535)" +database_pool_size = "range(1, 100)" +memory = "pattern(^\\d+[MG]B$)" + +[control-center] +port = "range(1, 65535)" +replicas = "range(1, 10)" + +[nginx] +worker_processes = "range(1, 32)" +worker_connections = "range(1, 65536)" +``` + +## Workflow: Setup to Deployment + +```toml +1. User runs setup command + ↓ +2. TypeDialog displays form + ↓ +3. User fills form with validation + ↓ +4. Form data → Nickel config + ↓ +5. Nickel config → TOML (via ConfigLoader) + ↓ +6. Service reads TOML config + ↓ +7. Service starts with configured values +``` + +## Benefits of TypeDialog Integration + +- ✅ **Type-safe forms** - Generated from Nickel schemas +- ✅ **Real-time validation** - Enforce constraints as user types +- ✅ **Progressive disclosure** - Show advanced options only when needed +- ✅ **Consistent UX** - Same forms across platforms (CLI, Web, TUI) +- ✅ **Auto-generated** - Forms stay in sync with schemas automatically +- ✅ **TTY handling** - Bash wrappers solve Nushell input stack issues +- ✅ **Graceful fallback** - Falls back to basic prompts if TypeDialog unavailable + +## Testing TypeDialog Forms + +```toml +# Validate form structure +typedialog check-form provisioning/platform/.typedialog/forms/orchestrator.form.toml + +# Run form with test data +typedialog run-form + --form provisioning/platform/.typedialog/forms/orchestrator.form.toml + --test-mode # Automated validation + +# Generate sample output +typedialog generate-sample + --form provisioning/platform/.typedialog/forms/orchestrator.form.toml + --output /tmp/orchestrator-sample.ncl +``` + +## Migration Path + +### Phase A: Legacy (DEPRECATED) + +```toml +FormInquire (Jinja2) → Nushell processing → TOML config +Status: ARCHIVED to .coder/archive/forminquire/ +``` + +### Phase B: Current Implementation + +```toml +Bash wrapper → TypeDialog (TTY input) → Nickel config → JSON export → Nushell reads JSON +Status: IMPLEMENTED with forms ready +``` + +### Phase C: TypeDialog Binary Available (Future) + +```toml +TypeDialog binary installed → Full nickel-roundtrip workflow → Auto-sync with schemas +Status: PLANNED - awaiting TypeDialog binary release +``` + +### Phase D: Unified (Future) + +```toml +ConfigLoader discovers config → Service reads → TypeDialog updates UI +``` + +## Integration with Infrastructure Schemas + +TypeDialog forms work seamlessly with infrastructure schemas: + +### Infrastructure Configuration Workflow + +**1. Define Infrastructure Schemas** (completed) +- Location: `provisioning/schemas/infrastructure/` +- 6 schemas: docker-compose, kubernetes, nginx, prometheus, systemd, oci-registry +- All validated with `nickel typecheck` + +**2. Generate Infrastructure Configs** (completed) +- Script: `provisioning/platform/scripts/generate-infrastructure-configs.nu` +- Supports: solo, multiuser, enterprise, cicd modes +- Formats: YAML, JSON, conf, service + +**3. Validate Generated Configs** (completed) +- Script: `provisioning/platform/scripts/validate-infrastructure.nu` +- Tools: docker-compose config, kubectl apply --dry-run, nginx -t, promtool check +- Examples: `examples-solo-deployment.ncl`, `examples-enterprise-deployment.ncl` + +**4. Interactive Setup with Forms** (TypeDialog ready) +- Script: `provisioning/platform/scripts/setup-with-forms.sh` +- Bash wrappers: `provisioning/core/shlib/*-tty.sh` (handle TTY input) +- Forms ready: setup-wizard, auth-login, mfa-enroll +- Fallback: Basic Nushell prompts if TypeDialog unavailable + +### Current Status: Full Infrastructure Support + +| Component | Status | Details | +| ----------- | -------- | --------- | +| **Schemas** | ✅ Complete | 6 infrastructure schemas (1,577 lines) | +| **Examples** | ✅ Complete | 2 deployment examples (solo, enterprise) | +| **Generation Script** | ✅ Complete | Auto-generates configs for all modes | +| **Validation Script** | ✅ Complete | Validates Docker, K8s, Nginx, Prometheus | +| **Setup Wizard** | ✅ Complete | TypeDialog forms + bash wrappers ready | +| **TypeDialog Integration** | ⏳ Pending | Structure ready, awaiting binary | + +### Validated Examples + +**Solo Deployment** (`examples-solo-deployment.ncl`): +- ✅ Type-checks without errors +- ✅ Exports to 198 lines of JSON +- ✅ 5 Docker Compose services +- ✅ Resource limits: 1.0-4.0 CPU, 256M-1024M RAM +- ✅ Prometheus: 4 scrape jobs +- ✅ Registry backend: Zot (filesystem) + +**Enterprise Deployment** (`examples-enterprise-deployment.ncl`): +- ✅ Type-checks without errors +- ✅ Exports to 313 lines of JSON +- ✅ 6 Docker Compose services with HA +- ✅ Resource limits: 2.0-4.0 CPU, 512M-4096M RAM +- ✅ Prometheus: 7 scrape jobs with remote storage +- ✅ Registry backend: Harbor (S3 distributed) + +### Test Infrastructure Generation + +```toml +# Export solo infrastructure +nickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl > /tmp/solo.json + +# Validate JSON +jq . /tmp/solo.json + +# Check Docker Compose services +jq '.docker_compose_services | keys' /tmp/solo.json + +# Compare resource allocation (solo vs enterprise) +jq '.docker_compose_services.orchestrator.deploy.resources.limits' /tmp/solo.json +jq '.docker_compose_services.orchestrator.deploy.resources.limits' /tmp/enterprise.json +``` + +## Next Steps + +1. **Infrastructure Setup** (available now): + - Generate infrastructure configs with automation scripts + - Validate with format-specific tools + - Use interactive setup wizard for configuration + +2. **When TypeDialog binary becomes available**: + - Install TypeDialog binary + - Forms already created and ready to use + - Bash wrappers handle TTY input (no Nushell stack issues) + - Full nickel-roundtrip workflow will be enabled + +3. **Production Deployment**: + - Use validated infrastructure configs + - Deploy with ConfigLoader + infrastructure schemas + - Monitor via Prometheus (auto-generated from schemas) + +--- + +**Version**: 1.2.0 (TypeDialog Forms + Bash Wrappers) +**Status**: TypeDialog forms ready with bash wrappers; Awaiting TypeDialog Binary +**Last Updated**: 2025-01-09 +**ForminQuire Status**: DEPRECATED - Archived to .coder/archive/forminquire/ +**Fallback**: Basic Nushell prompts if TypeDialog unavailable +**Tested**: Infrastructure examples (solo + enterprise) validated \ No newline at end of file diff --git a/config/README.md b/config/README.md index 342d850..57a1171 100644 --- a/config/README.md +++ b/config/README.md @@ -1 +1,109 @@ -# Platform Service Configuration Files\n\nThis directory contains **16 production-ready TOML configuration files** generated from Nickel schemas\nfor all platform services across all deployment modes.\n\n## Generated Files\n\n**4 Services × 4 Deployment Modes = 16 Configuration Files**\n\n```{$detected_lang}\norchestrator.{solo,multiuser,cicd,enterprise}.toml (2.2 kB each)\ncontrol-center.{solo,multiuser,cicd,enterprise}.toml (3.4 kB each)\nmcp-server.{solo,multiuser,cicd,enterprise}.toml (2.7 kB each)\ninstaller.{solo,multiuser,cicd,enterprise}.toml (2.5 kB each)\n```\n\n**Total**: ~45 KB, all validated and ready for deployment\n\n## Deployment Modes\n\n| Mode | Resources | Database | Use Case | Load |\n| ------ | ----------- | ---------- | ---------- | ------ |\n| **solo** | 2 CPU, 4 GB | Embedded | Development | `ORCHESTRATOR_MODE=solo` |\n| **multiuser** | 4 CPU, 8 GB | PostgreSQL/SurrealDB | Team Staging | `ORCHESTRATOR_MODE=multiuser` |\n| **cicd** | 8 CPU, 16 GB | Ephemeral | CI/CD Pipelines | `ORCHESTRATOR_MODE=cicd` |\n| **enterprise** | 16+ CPU, 32+ GB | SurrealDB HA | Production | `ORCHESTRATOR_MODE=enterprise` |\n\n## Quick Start\n\n### Load a configuration mode\n\n```{$detected_lang}\n# Solo mode (single developer)\nexport ORCHESTRATOR_MODE=solo\nexport CONTROL_CENTER_MODE=solo\n\n# Multiuser mode (team development)\nexport ORCHESTRATOR_MODE=multiuser\nexport CONTROL_CENTER_MODE=multiuser\n\n# Enterprise mode (production HA)\nexport ORCHESTRATOR_MODE=enterprise\nexport CONTROL_CENTER_MODE=enterprise\n```\n\n### Override individual fields\n\n```{$detected_lang}\nexport ORCHESTRATOR_SERVER_WORKERS=8\nexport ORCHESTRATOR_SERVER_PORT=9090\nexport CONTROL_CENTER_REQUIRE_MFA=true\n```\n\n## Configuration Loading Hierarchy\n\nEach service loads configuration with this priority:\n\n1. **Explicit path** — `{SERVICE}_CONFIG` environment variable\n2. **Mode-specific** — `{SERVICE}_MODE` → `provisioning/platform/config/{service}.{mode}.toml`\n3. **Legacy** — `config.user.toml` (backward compatibility)\n4. **Defaults** — `config.defaults.toml` or built-in\n5. **Field overrides** — `{SERVICE}_*` environment variables\n\n## Docker Compose Integration\n\n```{$detected_lang}\nexport DEPLOYMENT_MODE=multiuser\ndocker-compose -f provisioning/platform/infrastructure/docker/docker-compose.yml up\n```\n\n## Kubernetes Integration\n\n```{$detected_lang}\n# Load enterprise mode configs into K8s\nkubectl create configmap orchestrator-config \\n --from-file=provisioning/platform/config/orchestrator.enterprise.toml\n```\n\n## Validation\n\nVerify all configs parse correctly:\n\n```{$detected_lang}\nfor file in *.toml; do\n nu -c "open '$file'" && echo "✅ $file" || echo "❌ $file"\ndone\n```\n\n## Structure\n\n- **orchestrator.*.toml** — Workflow engine configuration\n- **control-center.*.toml** — Policy/RBAC backend configuration\n- **mcp-server.*.toml** — MCP server configuration\n- **installer.*.toml** — Installation/bootstrap configuration\n\nEach file contains service-specific settings for networking, storage, security, logging, and monitoring.\n\n## Related Documentation\n\n- **Configuration workflow**: `provisioning/.typedialog/provisioning/platform/configuration-workflow.md`\n- **Usage guide**: `provisioning/.typedialog/provisioning/platform/usage-guide.md`\n- **Schema definitions**: `provisioning/.typedialog/provisioning/platform/schemas/`\n- **Default values**: `provisioning/.typedialog/provisioning/platform/defaults/`\n\n## Generated By\n\n**Framework**: TypeDialog + Nickel Configuration System\n**Date**: 2026-01-05\n**Status**: ✅ Production Ready +# Platform Service Configuration Files + +This directory contains **16 production-ready TOML configuration files** generated from Nickel schemas +for all platform services across all deployment modes. + +## Generated Files + +**4 Services × 4 Deployment Modes = 16 Configuration Files** + +```toml +orchestrator.{solo,multiuser,cicd,enterprise}.toml (2.2 kB each) +control-center.{solo,multiuser,cicd,enterprise}.toml (3.4 kB each) +mcp-server.{solo,multiuser,cicd,enterprise}.toml (2.7 kB each) +installer.{solo,multiuser,cicd,enterprise}.toml (2.5 kB each) +``` + +**Total**: ~45 KB, all validated and ready for deployment + +## Deployment Modes + +| Mode | Resources | Database | Use Case | Load | +| ------ | ----------- | ---------- | ---------- | ------ | +| **solo** | 2 CPU, 4 GB | Embedded | Development | `ORCHESTRATOR_MODE=solo` | +| **multiuser** | 4 CPU, 8 GB | PostgreSQL/SurrealDB | Team Staging | `ORCHESTRATOR_MODE=multiuser` | +| **cicd** | 8 CPU, 16 GB | Ephemeral | CI/CD Pipelines | `ORCHESTRATOR_MODE=cicd` | +| **enterprise** | 16+ CPU, 32+ GB | SurrealDB HA | Production | `ORCHESTRATOR_MODE=enterprise` | + +## Quick Start + +### Load a configuration mode + +```toml +# Solo mode (single developer) +export ORCHESTRATOR_MODE=solo +export CONTROL_CENTER_MODE=solo + +# Multiuser mode (team development) +export ORCHESTRATOR_MODE=multiuser +export CONTROL_CENTER_MODE=multiuser + +# Enterprise mode (production HA) +export ORCHESTRATOR_MODE=enterprise +export CONTROL_CENTER_MODE=enterprise +``` + +### Override individual fields + +```javascript +export ORCHESTRATOR_SERVER_WORKERS=8 +export ORCHESTRATOR_SERVER_PORT=9090 +export CONTROL_CENTER_REQUIRE_MFA=true +``` + +## Configuration Loading Hierarchy + +Each service loads configuration with this priority: + +1. **Explicit path** — `{SERVICE}_CONFIG` environment variable +2. **Mode-specific** — `{SERVICE}_MODE` → `provisioning/platform/config/{service}.{mode}.toml` +3. **Legacy** — `config.user.toml` (backward compatibility) +4. **Defaults** — `config.defaults.toml` or built-in +5. **Field overrides** — `{SERVICE}_*` environment variables + +## Docker Compose Integration + +```javascript +export DEPLOYMENT_MODE=multiuser +docker-compose -f provisioning/platform/infrastructure/docker/docker-compose.yml up +``` + +## Kubernetes Integration + +```yaml +# Load enterprise mode configs into K8s +kubectl create configmap orchestrator-config + --from-file=provisioning/platform/config/orchestrator.enterprise.toml +``` + +## Validation + +Verify all configs parse correctly: + +```toml +for file in *.toml; do + nu -c "open '$file'" && echo "✅ $file" || echo "❌ $file" +done +``` + +## Structure + +- **orchestrator.*.toml** — Workflow engine configuration +- **control-center.*.toml** — Policy/RBAC backend configuration +- **mcp-server.*.toml** — MCP server configuration +- **installer.*.toml** — Installation/bootstrap configuration + +Each file contains service-specific settings for networking, storage, security, logging, and monitoring. + +## Related Documentation + +- **Configuration workflow**: `provisioning/.typedialog/provisioning/platform/configuration-workflow.md` +- **Usage guide**: `provisioning/.typedialog/provisioning/platform/usage-guide.md` +- **Schema definitions**: `provisioning/.typedialog/provisioning/platform/schemas/` +- **Default values**: `provisioning/.typedialog/provisioning/platform/defaults/` + +## Generated By + +**Framework**: TypeDialog + Nickel Configuration System +**Date**: 2026-01-05 +**Status**: ✅ Production Ready \ No newline at end of file diff --git a/config/examples/README.md b/config/examples/README.md index feccc1b..787de36 100644 --- a/config/examples/README.md +++ b/config/examples/README.md @@ -1 +1,201 @@ -# Platform Configuration Examples\n\nThis directory contains example Nickel files demonstrating how to generate platform configurations for different deployment modes.\n\n## File Structure\n\n```{$detected_lang}\nexamples/\n├── README.md # This file\n├── orchestrator.solo.example.ncl # Solo deployment (1 CPU, 1GB memory)\n├── orchestrator.multiuser.example.ncl # Multiuser deployment (2 CPU, 2GB memory, HA)\n├── orchestrator.enterprise.example.ncl # Enterprise deployment (4 CPU, 4GB memory, 3 replicas)\n└── control-center.solo.example.ncl # Control Center solo deployment\n```\n\n## Usage\n\nTo generate actual TOML configuration from an example:\n\n```{$detected_lang}\n# Export to TOML (placed in runtime/generated/)\nnickel export --format toml examples/orchestrator.solo.example.ncl > runtime/generated/orchestrator.solo.toml\n\n# Export to JSON for inspection\nnickel export --format json examples/orchestrator.solo.example.ncl | jq .\n\n# Type check example\nnickel typecheck examples/orchestrator.solo.example.ncl\n```\n\n## Key Concepts\n\n### 1. Schemas Reference\nAll examples import from the schema library:\n- `provisioning/schemas/platform/schemas/orchestrator.ncl`\n- `provisioning/schemas/platform/defaults/orchestrator-defaults.ncl`\n\n### 2. Mode-Based Composition\nEach example uses composition helpers to overlay mode-specific settings:\n\n```{$detected_lang}\nlet helpers = import "../../schemas/platform/common/helpers.ncl" in\nlet defaults = import "../../schemas/platform/defaults/orchestrator-defaults.ncl" in\nlet mode = import "../../schemas/platform/defaults/deployment/solo-defaults.ncl" in\n\nhelpers.compose_config defaults mode {\n # User-specific overrides here\n}\n```\n\n### 3. ConfigLoader Integration\nGenerated TOML files are automatically loaded by Rust services:\n\n```{$detected_lang}\nuse platform_config::OrchestratorConfig;\n\nlet config = OrchestratorConfig::load().expect("Failed to load orchestrator config");\nprintln!("Orchestrator listening on port: {}", config.server.port);\n```\n\n## Mode Reference\n\n| Mode | CPU | Memory | Replicas | Use Case |\n| ------ | ----- | -------- | ---------- | ---------- |\n| **solo** | 1.0 | 1024M | 1 | Development, testing |\n| **multiuser** | 2.0 | 2048M | 2 | Staging, small production |\n| **enterprise** | 4.0 | 4096M | 3+ | Large production deployments |\n| **cicd** | 2.0 | 2048M | 1 | CI/CD pipelines |\n\n## Workflow: Platform Configuration\n\n1. **Choose deployment mode** → select example file (orchestrator.solo.example.ncl, etc.)\n2. **Customize if needed** → modify the example\n3. **Generate config** → `nickel export --format toml`\n4. **Place in runtime/generated/** → ConfigLoader picks it up automatically\n5. **Service reads config** → via platform-config crate\n\n## Infrastructure Generation\n\nThese platform configuration examples work together with infrastructure schemas to create complete deployments.\n\n### Complete Infrastructure Stack\n\nBeyond platform configs, you can generate complete infrastructure from schemas:\n\n**Infrastructure Examples**:\n- `provisioning/schemas/infrastructure/examples-solo-deployment.ncl` - Solo infrastructure\n- `provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl` - Enterprise infrastructure\n\n**What Gets Generated**:\n\n```{$detected_lang}\n# Solo deployment infrastructure\nnickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl\n\n# Exports:\n# - docker_compose_services (5 services)\n# - nginx_config (load balancer setup)\n# - prometheus_config (4 scrape jobs)\n# - oci_registry_config (container registry)\n```\n\n**Integration Pattern**:\n\n```{$detected_lang}\nPlatform Config (Orchestrator, Control Center, etc.)\n ↓ ConfigLoader reads TOML\n ↓ Services start with config\n\nInfrastructure Config (Docker, Nginx, Prometheus, etc.)\n ↓ nickel export → YAML/JSON\n ↓ Deploy with Docker/Kubernetes/Nginx\n```\n\n### Generation and Validation\n\n**Generate all infrastructure configs**:\n\n```{$detected_lang}\nprovisioning/platform/scripts/generate-infrastructure-configs.nu --mode solo --format yaml\nprovisioning/platform/scripts/generate-infrastructure-configs.nu --mode enterprise --format json\n```\n\n**Validate generated configs**:\n\n```{$detected_lang}\nprovisioning/platform/scripts/validate-infrastructure.nu --config-dir /tmp/infra\n\n# Output shows validation results for:\n# - Docker Compose (docker-compose config --quiet)\n# - Kubernetes (kubectl apply --dry-run=client)\n# - Nginx (nginx -t)\n# - Prometheus (promtool check config)\n```\n\n**Interactive setup**:\n\n```{$detected_lang}\nbash provisioning/platform/scripts/setup-with-forms.sh\n# Uses TypeDialog bash wrappers (TTY-safe) or basic Nushell prompts as fallback\n```\n\n## Error Handling\n\nIf configuration fails to load:\n\n```{$detected_lang}\n# Validate Nickel syntax\nnickel typecheck examples/orchestrator.solo.example.ncl\n\n# Check TOML validity\ncargo test --package platform-config --test validation\n\n# Verify path resolution\nprovisioning validate-config --check-paths\n```\n\n## Environment Variable Overrides\n\nEven with TOML configs, environment variables take precedence:\n\n```{$detected_lang}\nexport PROVISIONING_MODE=multiuser\nexport ORCHESTRATOR_PORT=9000\nprovisioning orchestrator start # Uses env overrides\n```\n\n## Adding New Configurations\n\nTo add a new service configuration:\n\n1. Create `service-name.mode.example.ncl` in this directory\n2. Import the service schema: `import "../../schemas/platform/schemas/service-name.ncl"`\n3. Compose using helpers: `helpers.compose_config defaults mode {}`\n4. Document in this README\n5. Test with: `nickel typecheck` and `nickel export --format json`\n\n## Platform vs Infrastructure Configuration\n\n**Platform Configuration** (this directory):\n- Service-specific settings (port, database host, logging level)\n- Loaded by ConfigLoader at service startup\n- Format: TOML files in `runtime/generated/`\n- Examples: orchestrator.solo.example.ncl, orchestrator.multiuser.example.ncl\n\n**Infrastructure Configuration** (provisioning/schemas/infrastructure/):\n- Deployment-specific settings (replicas, resources, networking)\n- Generated and validated separately\n- Formats: YAML (Docker/Kubernetes), JSON (registries), conf (Nginx)\n- Examples: examples-solo-deployment.ncl, examples-enterprise-deployment.ncl\n\n**Why Both?**:\n- Platform config: How should Orchestrator behave? (internal settings)\n- Infrastructure config: How should Orchestrator be deployed? (external deployment)\n\n---\n\n**Last Updated**: 2025-01-06 (Updated with Infrastructure Integration Guide)\n**ConfigLoader Version**: 2.0.0\n**Nickel Version**: Latest\n**Infrastructure Integration**: Complete with schemas, examples, and validation scripts +# Platform Configuration Examples + +This directory contains example Nickel files demonstrating how to generate platform configurations for different deployment modes. + +## File Structure + +```bash +examples/ +├── README.md # This file +├── orchestrator.solo.example.ncl # Solo deployment (1 CPU, 1GB memory) +├── orchestrator.multiuser.example.ncl # Multiuser deployment (2 CPU, 2GB memory, HA) +├── orchestrator.enterprise.example.ncl # Enterprise deployment (4 CPU, 4GB memory, 3 replicas) +└── control-center.solo.example.ncl # Control Center solo deployment +``` + +## Usage + +To generate actual TOML configuration from an example: + +```toml +# Export to TOML (placed in runtime/generated/) +nickel export --format toml examples/orchestrator.solo.example.ncl > runtime/generated/orchestrator.solo.toml + +# Export to JSON for inspection +nickel export --format json examples/orchestrator.solo.example.ncl | jq . + +# Type check example +nickel typecheck examples/orchestrator.solo.example.ncl +``` + +## Key Concepts + +### 1. Schemas Reference +All examples import from the schema library: +- `provisioning/schemas/platform/schemas/orchestrator.ncl` +- `provisioning/schemas/platform/defaults/orchestrator-defaults.ncl` + +### 2. Mode-Based Composition +Each example uses composition helpers to overlay mode-specific settings: + +```javascript +let helpers = import "../../schemas/platform/common/helpers.ncl" in +let defaults = import "../../schemas/platform/defaults/orchestrator-defaults.ncl" in +let mode = import "../../schemas/platform/defaults/deployment/solo-defaults.ncl" in + +helpers.compose_config defaults mode { + # User-specific overrides here +} +``` + +### 3. ConfigLoader Integration +Generated TOML files are automatically loaded by Rust services: + +```toml +use platform_config::OrchestratorConfig; + +let config = OrchestratorConfig::load().expect("Failed to load orchestrator config"); +println!("Orchestrator listening on port: {}", config.server.port); +``` + +## Mode Reference + +| Mode | CPU | Memory | Replicas | Use Case | +| ------ | ----- | -------- | ---------- | ---------- | +| **solo** | 1.0 | 1024M | 1 | Development, testing | +| **multiuser** | 2.0 | 2048M | 2 | Staging, small production | +| **enterprise** | 4.0 | 4096M | 3+ | Large production deployments | +| **cicd** | 2.0 | 2048M | 1 | CI/CD pipelines | + +## Workflow: Platform Configuration + +1. **Choose deployment mode** → select example file (orchestrator.solo.example.ncl, etc.) +2. **Customize if needed** → modify the example +3. **Generate config** → `nickel export --format toml` +4. **Place in runtime/generated/** → ConfigLoader picks it up automatically +5. **Service reads config** → via platform-config crate + +## Infrastructure Generation + +These platform configuration examples work together with infrastructure schemas to create complete deployments. + +### Complete Infrastructure Stack + +Beyond platform configs, you can generate complete infrastructure from schemas: + +**Infrastructure Examples**: +- `provisioning/schemas/infrastructure/examples-solo-deployment.ncl` - Solo infrastructure +- `provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl` - Enterprise infrastructure + +**What Gets Generated**: + +```bash +# Solo deployment infrastructure +nickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl + +# Exports: +# - docker_compose_services (5 services) +# - nginx_config (load balancer setup) +# - prometheus_config (4 scrape jobs) +# - oci_registry_config (container registry) +``` + +**Integration Pattern**: + +```bash +Platform Config (Orchestrator, Control Center, etc.) + ↓ ConfigLoader reads TOML + ↓ Services start with config + +Infrastructure Config (Docker, Nginx, Prometheus, etc.) + ↓ nickel export → YAML/JSON + ↓ Deploy with Docker/Kubernetes/Nginx +``` + +### Generation and Validation + +**Generate all infrastructure configs**: + +```toml +provisioning/platform/scripts/generate-infrastructure-configs.nu --mode solo --format yaml +provisioning/platform/scripts/generate-infrastructure-configs.nu --mode enterprise --format json +``` + +**Validate generated configs**: + +```toml +provisioning/platform/scripts/validate-infrastructure.nu --config-dir /tmp/infra + +# Output shows validation results for: +# - Docker Compose (docker-compose config --quiet) +# - Kubernetes (kubectl apply --dry-run=client) +# - Nginx (nginx -t) +# - Prometheus (promtool check config) +``` + +**Interactive setup**: + +```bash +bash provisioning/platform/scripts/setup-with-forms.sh +# Uses TypeDialog bash wrappers (TTY-safe) or basic Nushell prompts as fallback +``` + +## Error Handling + +If configuration fails to load: + +```toml +# Validate Nickel syntax +nickel typecheck examples/orchestrator.solo.example.ncl + +# Check TOML validity +cargo test --package platform-config --test validation + +# Verify path resolution +provisioning validate-config --check-paths +``` + +## Environment Variable Overrides + +Even with TOML configs, environment variables take precedence: + +```javascript +export PROVISIONING_MODE=multiuser +export ORCHESTRATOR_PORT=9000 +provisioning orchestrator start # Uses env overrides +``` + +## Adding New Configurations + +To add a new service configuration: + +1. Create `service-name.mode.example.ncl` in this directory +2. Import the service schema: `import "../../schemas/platform/schemas/service-name.ncl"` +3. Compose using helpers: `helpers.compose_config defaults mode {}` +4. Document in this README +5. Test with: `nickel typecheck` and `nickel export --format json` + +## Platform vs Infrastructure Configuration + +**Platform Configuration** (this directory): +- Service-specific settings (port, database host, logging level) +- Loaded by ConfigLoader at service startup +- Format: TOML files in `runtime/generated/` +- Examples: orchestrator.solo.example.ncl, orchestrator.multiuser.example.ncl + +**Infrastructure Configuration** (provisioning/schemas/infrastructure/): +- Deployment-specific settings (replicas, resources, networking) +- Generated and validated separately +- Formats: YAML (Docker/Kubernetes), JSON (registries), conf (Nginx) +- Examples: examples-solo-deployment.ncl, examples-enterprise-deployment.ncl + +**Why Both?**: +- Platform config: How should Orchestrator behave? (internal settings) +- Infrastructure config: How should Orchestrator be deployed? (external deployment) + +--- + +**Last Updated**: 2025-01-06 (Updated with Infrastructure Integration Guide) +**ConfigLoader Version**: 2.0.0 +**Nickel Version**: Latest +**Infrastructure Integration**: Complete with schemas, examples, and validation scripts \ No newline at end of file diff --git a/crates/control-center-ui/README.md b/crates/control-center-ui/README.md index 1f5ffd3..87ceb99 100644 --- a/crates/control-center-ui/README.md +++ b/crates/control-center-ui/README.md @@ -1 +1,368 @@ -# Control Center UI - Audit Log Viewer\n\nA comprehensive React-based audit log viewer for the Cedar Policy Engine with advanced search, real-time streaming,\ncompliance reporting, and visualization capabilities.\n\n## 🚀 Features\n\n### 🔍 Advanced Search & Filtering\n\n- **Multi-dimensional Filters**: Date range, users, actions, resources, severity, compliance frameworks\n- **Real-time Search**: Debounced search with instant results\n- **Saved Searches**: Save and reuse complex filter combinations\n- **Quick Filters**: One-click access to common time ranges and filters\n- **Correlation Search**: Find logs by request ID, session ID, or trace correlation\n\n### 📊 High-Performance Data Display\n\n- **Virtual Scrolling**: Handle millions of log entries with smooth scrolling\n- **Infinite Loading**: Automatic pagination with optimized data fetching\n- **Column Sorting**: Sort by any field with persistent state\n- **Bulk Selection**: Select multiple logs for batch operations\n- **Responsive Design**: Works seamlessly on desktop, tablet, and mobile\n\n### 🔴 Real-time Streaming\n\n- **WebSocket Integration**: Live log updates without page refresh\n- **Connection Management**: Automatic reconnection with exponential backoff\n- **Real-time Indicators**: Visual status of live connection\n- **Message Queuing**: Handles high-volume log streams efficiently\n- **Alert Notifications**: Critical events trigger immediate notifications\n\n### 📋 Detailed Log Inspection\n\n- **JSON Viewer**: Syntax-highlighted JSON with collapsible sections\n- **Multi-tab Interface**: Overview, Context, Metadata, Compliance, Raw JSON\n- **Sensitive Data Toggle**: Hide/show sensitive information\n- **Copy Utilities**: One-click copying of IDs, values, and entire records\n- **Deep Linking**: Direct URLs to specific log entries\n\n### 📤 Export & Reporting\n\n- **Multiple Formats**: CSV, JSON, PDF export with customizable fields\n- **Template System**: Pre-built templates for different report types\n- **Batch Export**: Export filtered results or selected logs\n- **Progress Tracking**: Real-time export progress indication\n- **Custom Fields**: Choose exactly which data to include\n\n### 🛡️ Compliance Management\n\n- **Framework Support**: SOC2, HIPAA, PCI DSS, GDPR compliance templates\n- **Report Generation**: Automated compliance reports with evidence\n- **Finding Tracking**: Track violations and remediation status\n- **Attestation Management**: Digital signatures and certifications\n- **Template Library**: Customizable report templates for different frameworks\n\n### 🔗 Log Correlation & Tracing\n\n- **Request Tracing**: Follow request flows across services\n- **Session Analysis**: View all activity for a user session\n- **Dependency Mapping**: Understand log relationships and causality\n- **Timeline Views**: Chronological visualization of related events\n\n### 📈 Visualization & Analytics\n\n- **Dashboard Metrics**: Real-time statistics and KPIs\n- **Timeline Charts**: Visual representation of log patterns\n- **Geographic Distribution**: Location-based log analysis\n- **Severity Trends**: Track security event patterns over time\n- **User Activity**: Monitor user behavior and access patterns\n\n## 🛠 Technology Stack\n\n### Frontend Framework\n\n- **React 18.3.1**: Modern React with hooks and concurrent features\n- **TypeScript 5.5.4**: Type-safe development with advanced types\n- **Vite 5.4.1**: Lightning-fast build tool and dev server\n\n### UI Components & Styling\n\n- **TailwindCSS 3.4.9**: Utility-first CSS framework\n- **DaisyUI 4.4.19**: Beautiful component library built on Tailwind\n- **Framer Motion 11.3.24**: Smooth animations and transitions\n- **Lucide React 0.427.0**: Beautiful, customizable icons\n\n### Data Management\n\n- **TanStack Query 5.51.23**: Powerful data fetching and caching\n- **TanStack Table 8.20.1**: Headless table utilities for complex data\n- **TanStack Virtual 3.8.4**: Virtual scrolling for performance\n- **Zustand 4.5.4**: Lightweight state management\n\n### Forms & Validation\n\n- **React Hook Form 7.52.2**: Performant forms with minimal re-renders\n- **React Select 5.8.0**: Flexible select components with search\n\n### Real-time & Networking\n\n- **Native WebSocket API**: Direct WebSocket integration\n- **Custom Hooks**: Reusable WebSocket management with reconnection\n\n### Export & Reporting\n\n- **jsPDF 2.5.1**: Client-side PDF generation\n- **jsPDF AutoTable 3.8.2**: Table formatting for PDF reports\n- **Native Blob API**: File download and export functionality\n\n### Date & Time\n\n- **date-fns 3.6.0**: Modern date utility library with tree shaking\n\n## 📁 Project Structure\n\n```{$detected_lang}\nsrc/\n├── components/audit/ # Audit log components\n│ ├── AuditLogViewer.tsx # Main viewer component\n│ ├── SearchFilters.tsx # Advanced search interface\n│ ├── VirtualizedLogTable.tsx # High-performance table\n│ ├── LogDetailModal.tsx # Detailed log inspection\n│ ├── ExportModal.tsx # Export functionality\n│ ├── ComplianceReportGenerator.tsx # Compliance reports\n│ └── RealTimeIndicator.tsx # WebSocket status\n├── hooks/ # Custom React hooks\n│ └── useWebSocket.ts # WebSocket management\n├── services/ # API integration\n│ └── api.ts # Audit API client\n├── types/ # TypeScript definitions\n│ └── audit.ts # Audit-specific types\n├── utils/ # Utility functions\n├── store/ # State management\n└── styles/ # CSS and styling\n```\n\n## 🔧 Setup and Development\n\n### Prerequisites\n\n- **Node.js 18+** and **npm 9+**\n- **Control Center backend** running on `http://localhost:8080`\n\n### Installation\n\n```{$detected_lang}\n# Clone the repository\ngit clone \ncd control-center-ui\n\n# Install dependencies\nnpm install\n\n# Start development server\nnpm run dev\n```\n\nThe application will be available at `http://localhost:3000`\n\n### Building for Production\n\n```{$detected_lang}\n# Type check\nnpm run type-check\n\n# Build for production\nnpm run build\n\n# Preview production build\nnpm run preview\n```\n\n## 🌐 API Integration\n\nThe UI integrates with the Control Center backend and expects the following endpoints:\n\n- `GET /audit/logs` - Fetch audit logs with filtering and pagination\n- `GET /audit/logs/{id}` - Get specific log entry details\n- `POST /audit/search` - Advanced search functionality\n- `GET /audit/saved-searches` - Manage saved search queries\n- `POST /audit/export` - Export logs in various formats (CSV, JSON, PDF)\n- `GET /compliance/reports` - Compliance report management\n- `POST /compliance/reports/generate` - Generate compliance reports\n- `WS /audit/stream` - Real-time log streaming via WebSocket\n- `GET /health` - Health check endpoint\n\n### WebSocket Integration\n\nReal-time log streaming is implemented using WebSocket connections:\n\n```{$detected_lang}\nimport { useWebSocket } from './hooks/useWebSocket';\n\nconst { isConnected, lastMessage } = useWebSocket({\n url: 'ws://localhost:8080/ws/audit',\n onNewAuditLog: (log) => {\n // Handle new log entry in real-time\n updateLogsList(log);\n }\n});\n```\n\n## ✅ Features Implemented\n\n### Core Audit Log Viewer System\n\n- ✅ **Advanced Search Filters**: Multi-dimensional filtering with date range, users, actions, resources, severity, compliance frameworks\n- ✅ **Virtual Scrolling Component**: High-performance rendering capable of handling millions of log entries\n- ✅ **Real-time Log Streaming**: WebSocket integration with automatic reconnection and live status indicators\n- ✅ **Detailed Log Modal**: Multi-tab interface with JSON syntax highlighting, sensitive data toggle, and copy utilities\n- ✅ **Export Functionality**: Support for CSV, JSON, and PDF formats with customizable fields and templates\n- ✅ **Saved Search Queries**: User preference system for saving and reusing complex search combinations\n\n### Compliance & Security Features\n\n- ✅ **Compliance Report Generator**: Automated report generation with SOC2, HIPAA, PCI DSS, and GDPR templates\n- ✅ **Violation Tracking**: Remediation workflow system with task management and progress tracking\n- ✅ **Timeline Visualization**: Chronological visualization of audit trails with correlation mapping\n- ✅ **Request ID Correlation**: Cross-service request tracing and session analysis\n- ✅ **Attestation Management**: Digital signature system for compliance certifications\n- ✅ **Log Retention Management**: Archival policies and retention period management\n\n### Performance & User Experience\n\n- ✅ **Dashboard Analytics**: Real-time metrics including success rates, critical events, and compliance scores\n- ✅ **Responsive Design**: Mobile-first design that works across all device sizes\n- ✅ **Loading States**: Comprehensive loading indicators and skeleton screens\n- ✅ **Error Handling**: Robust error boundaries with user-friendly error messages\n- ✅ **Keyboard Shortcuts**: Accessibility features and keyboard navigation support\n\n## 🎨 Styling and Theming\n\n### TailwindCSS Configuration\n\nThe application uses a comprehensive TailwindCSS setup with:\n\n- **DaisyUI Components**: Pre-built, accessible UI components\n- **Custom Color Palette**: Primary, secondary, success, warning, error themes\n- **Custom Animations**: Smooth transitions and loading states\n- **Dark/Light Themes**: Automatic theme switching with system preference detection\n- **Responsive Grid System**: Mobile-first responsive design\n\n### Component Design System\n\n- **Consistent Spacing**: Standardized margin and padding scales\n- **Typography Scale**: Hierarchical text sizing and weights\n- **Icon System**: Comprehensive icon library with consistent styling\n- **Form Controls**: Validated, accessible form components\n- **Data Visualization**: Charts and metrics with consistent styling\n\n## 📱 Performance Optimization\n\n### Virtual Scrolling\n\n- Renders only visible rows for optimal performance\n- Handles datasets with millions of entries smoothly\n- Maintains smooth scrolling with momentum preservation\n- Automatic cleanup of off-screen elements\n\n### Efficient Data Fetching\n\n- Infinite queries with intelligent pagination\n- Aggressive caching with TanStack Query\n- Optimistic updates for better user experience\n- Background refetching for fresh data\n\n### Bundle Optimization\n\n- Code splitting by route and feature\n- Tree shaking for minimal bundle size\n- Lazy loading of heavy components\n- Optimized production builds\n\n## 🔒 Security Considerations\n\n### Data Protection\n\n- Sensitive data masking in UI components\n- Secure WebSocket connections (WSS in production)\n- Content Security Policy headers for XSS protection\n- Input sanitization for search queries\n\n### API Security\n\n- JWT token authentication support (when implemented)\n- Request rate limiting awareness\n- Secure file downloads with proper headers\n- CORS configuration for cross-origin requests\n\n## 🚀 Deployment\n\n### Docker Deployment\n\n```{$detected_lang}\nFROM node:18-alpine as builder\nWORKDIR /app\nCOPY package*.json ./\nRUN npm ci --only=production\nCOPY . .\nRUN npm run build\n\nFROM nginx:alpine\nCOPY --from=builder /app/dist /usr/share/nginx/html\nCOPY nginx.conf /etc/nginx/nginx.conf\nEXPOSE 80\nCMD ["nginx", "-g", "daemon off;"]\n```\n\n### Kubernetes Deployment\n\n```{$detected_lang}\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: control-center-ui\nspec:\n replicas: 3\n selector:\n matchLabels:\n app: control-center-ui\n template:\n metadata:\n labels:\n app: control-center-ui\n spec:\n containers:\n - name: control-center-ui\n image: control-center-ui:latest\n ports:\n - containerPort: 80\n env:\n - name: VITE_API_BASE_URL\n value: "https://api.example.com"\n```\n\n## 🤝 Contributing\n\n### Development Guidelines\n\n- Follow TypeScript strict mode conventions\n- Use existing component patterns and design system\n- Maintain accessibility standards (WCAG 2.1 AA)\n- Add proper error boundaries for robust error handling\n- Write meaningful commit messages following conventional commits\n\n### Code Style\n\n- Use Prettier for consistent code formatting\n- Follow ESLint rules for code quality\n- Use semantic HTML elements for accessibility\n- Maintain consistent naming conventions\n- Document complex logic with comments\n\n## 📄 License\n\nThis project follows the same license as the parent Control Center repository.\n\n## 🆘 Support\n\nFor questions, issues, or contributions:\n\n1. Check existing issues in the repository\n2. Review the comprehensive documentation\n3. Create detailed bug reports or feature requests\n4. Follow the established contribution guidelines\n\n---\n\nBuilt with ❤️ for comprehensive audit log management, compliance monitoring, and security analytics. +# Control Center UI - Audit Log Viewer + +A comprehensive React-based audit log viewer for the Cedar Policy Engine with advanced search, real-time streaming, +compliance reporting, and visualization capabilities. + +## 🚀 Features + +### 🔍 Advanced Search & Filtering + +- **Multi-dimensional Filters**: Date range, users, actions, resources, severity, compliance frameworks +- **Real-time Search**: Debounced search with instant results +- **Saved Searches**: Save and reuse complex filter combinations +- **Quick Filters**: One-click access to common time ranges and filters +- **Correlation Search**: Find logs by request ID, session ID, or trace correlation + +### 📊 High-Performance Data Display + +- **Virtual Scrolling**: Handle millions of log entries with smooth scrolling +- **Infinite Loading**: Automatic pagination with optimized data fetching +- **Column Sorting**: Sort by any field with persistent state +- **Bulk Selection**: Select multiple logs for batch operations +- **Responsive Design**: Works seamlessly on desktop, tablet, and mobile + +### 🔴 Real-time Streaming + +- **WebSocket Integration**: Live log updates without page refresh +- **Connection Management**: Automatic reconnection with exponential backoff +- **Real-time Indicators**: Visual status of live connection +- **Message Queuing**: Handles high-volume log streams efficiently +- **Alert Notifications**: Critical events trigger immediate notifications + +### 📋 Detailed Log Inspection + +- **JSON Viewer**: Syntax-highlighted JSON with collapsible sections +- **Multi-tab Interface**: Overview, Context, Metadata, Compliance, Raw JSON +- **Sensitive Data Toggle**: Hide/show sensitive information +- **Copy Utilities**: One-click copying of IDs, values, and entire records +- **Deep Linking**: Direct URLs to specific log entries + +### 📤 Export & Reporting + +- **Multiple Formats**: CSV, JSON, PDF export with customizable fields +- **Template System**: Pre-built templates for different report types +- **Batch Export**: Export filtered results or selected logs +- **Progress Tracking**: Real-time export progress indication +- **Custom Fields**: Choose exactly which data to include + +### 🛡️ Compliance Management + +- **Framework Support**: SOC2, HIPAA, PCI DSS, GDPR compliance templates +- **Report Generation**: Automated compliance reports with evidence +- **Finding Tracking**: Track violations and remediation status +- **Attestation Management**: Digital signatures and certifications +- **Template Library**: Customizable report templates for different frameworks + +### 🔗 Log Correlation & Tracing + +- **Request Tracing**: Follow request flows across services +- **Session Analysis**: View all activity for a user session +- **Dependency Mapping**: Understand log relationships and causality +- **Timeline Views**: Chronological visualization of related events + +### 📈 Visualization & Analytics + +- **Dashboard Metrics**: Real-time statistics and KPIs +- **Timeline Charts**: Visual representation of log patterns +- **Geographic Distribution**: Location-based log analysis +- **Severity Trends**: Track security event patterns over time +- **User Activity**: Monitor user behavior and access patterns + +## 🛠 Technology Stack + +### Frontend Framework + +- **React 18.3.1**: Modern React with hooks and concurrent features +- **TypeScript 5.5.4**: Type-safe development with advanced types +- **Vite 5.4.1**: Lightning-fast build tool and dev server + +### UI Components & Styling + +- **TailwindCSS 3.4.9**: Utility-first CSS framework +- **DaisyUI 4.4.19**: Beautiful component library built on Tailwind +- **Framer Motion 11.3.24**: Smooth animations and transitions +- **Lucide React 0.427.0**: Beautiful, customizable icons + +### Data Management + +- **TanStack Query 5.51.23**: Powerful data fetching and caching +- **TanStack Table 8.20.1**: Headless table utilities for complex data +- **TanStack Virtual 3.8.4**: Virtual scrolling for performance +- **Zustand 4.5.4**: Lightweight state management + +### Forms & Validation + +- **React Hook Form 7.52.2**: Performant forms with minimal re-renders +- **React Select 5.8.0**: Flexible select components with search + +### Real-time & Networking + +- **Native WebSocket API**: Direct WebSocket integration +- **Custom Hooks**: Reusable WebSocket management with reconnection + +### Export & Reporting + +- **jsPDF 2.5.1**: Client-side PDF generation +- **jsPDF AutoTable 3.8.2**: Table formatting for PDF reports +- **Native Blob API**: File download and export functionality + +### Date & Time + +- **date-fns 3.6.0**: Modern date utility library with tree shaking + +## 📁 Project Structure + +```bash +src/ +├── components/audit/ # Audit log components +│ ├── AuditLogViewer.tsx # Main viewer component +│ ├── SearchFilters.tsx # Advanced search interface +│ ├── VirtualizedLogTable.tsx # High-performance table +│ ├── LogDetailModal.tsx # Detailed log inspection +│ ├── ExportModal.tsx # Export functionality +│ ├── ComplianceReportGenerator.tsx # Compliance reports +│ └── RealTimeIndicator.tsx # WebSocket status +├── hooks/ # Custom React hooks +│ └── useWebSocket.ts # WebSocket management +├── services/ # API integration +│ └── api.ts # Audit API client +├── types/ # TypeScript definitions +│ └── audit.ts # Audit-specific types +├── utils/ # Utility functions +├── store/ # State management +└── styles/ # CSS and styling +``` + +## 🔧 Setup and Development + +### Prerequisites + +- **Node.js 18+** and **npm 9+** +- **Control Center backend** running on `http://localhost:8080` + +### Installation + +```bash +# Clone the repository +git clone +cd control-center-ui + +# Install dependencies +npm install + +# Start development server +npm run dev +``` + +The application will be available at `http://localhost:3000` + +### Building for Production + +```bash +# Type check +npm run type-check + +# Build for production +npm run build + +# Preview production build +npm run preview +``` + +## 🌐 API Integration + +The UI integrates with the Control Center backend and expects the following endpoints: + +- `GET /audit/logs` - Fetch audit logs with filtering and pagination +- `GET /audit/logs/{id}` - Get specific log entry details +- `POST /audit/search` - Advanced search functionality +- `GET /audit/saved-searches` - Manage saved search queries +- `POST /audit/export` - Export logs in various formats (CSV, JSON, PDF) +- `GET /compliance/reports` - Compliance report management +- `POST /compliance/reports/generate` - Generate compliance reports +- `WS /audit/stream` - Real-time log streaming via WebSocket +- `GET /health` - Health check endpoint + +### WebSocket Integration + +Real-time log streaming is implemented using WebSocket connections: + +```bash +import { useWebSocket } from './hooks/useWebSocket'; + +const { isConnected, lastMessage } = useWebSocket({ + url: 'ws://localhost:8080/ws/audit', + onNewAuditLog: (log) => { + // Handle new log entry in real-time + updateLogsList(log); + } +}); +``` + +## ✅ Features Implemented + +### Core Audit Log Viewer System + +- ✅ **Advanced Search Filters**: Multi-dimensional filtering with date range, users, actions, resources, severity, compliance frameworks +- ✅ **Virtual Scrolling Component**: High-performance rendering capable of handling millions of log entries +- ✅ **Real-time Log Streaming**: WebSocket integration with automatic reconnection and live status indicators +- ✅ **Detailed Log Modal**: Multi-tab interface with JSON syntax highlighting, sensitive data toggle, and copy utilities +- ✅ **Export Functionality**: Support for CSV, JSON, and PDF formats with customizable fields and templates +- ✅ **Saved Search Queries**: User preference system for saving and reusing complex search combinations + +### Compliance & Security Features + +- ✅ **Compliance Report Generator**: Automated report generation with SOC2, HIPAA, PCI DSS, and GDPR templates +- ✅ **Violation Tracking**: Remediation workflow system with task management and progress tracking +- ✅ **Timeline Visualization**: Chronological visualization of audit trails with correlation mapping +- ✅ **Request ID Correlation**: Cross-service request tracing and session analysis +- ✅ **Attestation Management**: Digital signature system for compliance certifications +- ✅ **Log Retention Management**: Archival policies and retention period management + +### Performance & User Experience + +- ✅ **Dashboard Analytics**: Real-time metrics including success rates, critical events, and compliance scores +- ✅ **Responsive Design**: Mobile-first design that works across all device sizes +- ✅ **Loading States**: Comprehensive loading indicators and skeleton screens +- ✅ **Error Handling**: Robust error boundaries with user-friendly error messages +- ✅ **Keyboard Shortcuts**: Accessibility features and keyboard navigation support + +## 🎨 Styling and Theming + +### TailwindCSS Configuration + +The application uses a comprehensive TailwindCSS setup with: + +- **DaisyUI Components**: Pre-built, accessible UI components +- **Custom Color Palette**: Primary, secondary, success, warning, error themes +- **Custom Animations**: Smooth transitions and loading states +- **Dark/Light Themes**: Automatic theme switching with system preference detection +- **Responsive Grid System**: Mobile-first responsive design + +### Component Design System + +- **Consistent Spacing**: Standardized margin and padding scales +- **Typography Scale**: Hierarchical text sizing and weights +- **Icon System**: Comprehensive icon library with consistent styling +- **Form Controls**: Validated, accessible form components +- **Data Visualization**: Charts and metrics with consistent styling + +## 📱 Performance Optimization + +### Virtual Scrolling + +- Renders only visible rows for optimal performance +- Handles datasets with millions of entries smoothly +- Maintains smooth scrolling with momentum preservation +- Automatic cleanup of off-screen elements + +### Efficient Data Fetching + +- Infinite queries with intelligent pagination +- Aggressive caching with TanStack Query +- Optimistic updates for better user experience +- Background refetching for fresh data + +### Bundle Optimization + +- Code splitting by route and feature +- Tree shaking for minimal bundle size +- Lazy loading of heavy components +- Optimized production builds + +## 🔒 Security Considerations + +### Data Protection + +- Sensitive data masking in UI components +- Secure WebSocket connections (WSS in production) +- Content Security Policy headers for XSS protection +- Input sanitization for search queries + +### API Security + +- JWT token authentication support (when implemented) +- Request rate limiting awareness +- Secure file downloads with proper headers +- CORS configuration for cross-origin requests + +## 🚀 Deployment + +### Docker Deployment + +```bash +FROM node:18-alpine as builder +WORKDIR /app +COPY package*.json ./ +RUN npm ci --only=production +COPY . . +RUN npm run build + +FROM nginx:alpine +COPY --from=builder /app/dist /usr/share/nginx/html +COPY nginx.conf /etc/nginx/nginx.conf +EXPOSE 80 +CMD ["nginx", "-g", "daemon off;"] +``` + +### Kubernetes Deployment + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: control-center-ui +spec: + replicas: 3 + selector: + matchLabels: + app: control-center-ui + template: + metadata: + labels: + app: control-center-ui + spec: + containers: + - name: control-center-ui + image: control-center-ui:latest + ports: + - containerPort: 80 + env: + - name: VITE_API_BASE_URL + value: "https://api.example.com" +``` + +## 🤝 Contributing + +### Development Guidelines + +- Follow TypeScript strict mode conventions +- Use existing component patterns and design system +- Maintain accessibility standards (WCAG 2.1 AA) +- Add proper error boundaries for robust error handling +- Write meaningful commit messages following conventional commits + +### Code Style + +- Use Prettier for consistent code formatting +- Follow ESLint rules for code quality +- Use semantic HTML elements for accessibility +- Maintain consistent naming conventions +- Document complex logic with comments + +## 📄 License + +This project follows the same license as the parent Control Center repository. + +## 🆘 Support + +For questions, issues, or contributions: + +1. Check existing issues in the repository +2. Review the comprehensive documentation +3. Create detailed bug reports or feature requests +4. Follow the established contribution guidelines + +--- + +Built with ❤️ for comprehensive audit log management, compliance monitoring, and security analytics. \ No newline at end of file diff --git a/crates/control-center-ui/REFERENCE.md b/crates/control-center-ui/REFERENCE.md index 384f42f..de1bccc 100644 --- a/crates/control-center-ui/REFERENCE.md +++ b/crates/control-center-ui/REFERENCE.md @@ -1 +1,33 @@ -# Control Center UI Reference\n\nThis directory will reference the existing control center UI implementation.\n\n## Current Implementation Location\n\n`/Users/Akasha/repo-cnz/src/control-center-ui/`\n\n## Implementation Details\n\n- **Language**: Web frontend (likely React/Vue/Leptos)\n- **Purpose**: Web interface for system management\n- **Features**:\n - Dashboard and monitoring UI\n - Configuration management interface\n - System administration controls\n\n## Integration Status\n\n- **Current**: Fully functional in original location\n- **New Structure**: Reference established\n- **Migration**: Planned for future phase\n\n## Usage\n\nThe control center UI remains fully functional at its original location.\n\n```{$detected_lang}\ncd /Users/Akasha/repo-cnz/src/control-center-ui\n# Use existing UI development commands\n```\n\nSee original implementation for development setup and usage instructions. +# Control Center UI Reference + +This directory will reference the existing control center UI implementation. + +## Current Implementation Location + +`/Users/Akasha/repo-cnz/src/control-center-ui/` + +## Implementation Details + +- **Language**: Web frontend (likely React/Vue/Leptos) +- **Purpose**: Web interface for system management +- **Features**: + - Dashboard and monitoring UI + - Configuration management interface + - System administration controls + +## Integration Status + +- **Current**: Fully functional in original location +- **New Structure**: Reference established +- **Migration**: Planned for future phase + +## Usage + +The control center UI remains fully functional at its original location. + +```bash +cd /Users/Akasha/repo-cnz/src/control-center-ui +# Use existing UI development commands +``` + +See original implementation for development setup and usage instructions. \ No newline at end of file diff --git a/crates/control-center-ui/auth-system.md b/crates/control-center-ui/auth-system.md index afbd232..4129129 100644 --- a/crates/control-center-ui/auth-system.md +++ b/crates/control-center-ui/auth-system.md @@ -1 +1,381 @@ -# Control Center UI - Leptos Authentication System\n\nA comprehensive authentication system built with Leptos and WebAssembly for cloud infrastructure management.\n\n## 🔐 Features Overview\n\n### Core Authentication\n\n- **Email/Password Login** with comprehensive validation\n- **JWT Token Management** with automatic refresh\n- **Secure Token Storage** with AES-256-GCM encryption in localStorage\n- **401 Response Interceptor** for automatic logout and token refresh\n\n### Multi-Factor Authentication (MFA)\n\n- **TOTP-based MFA** with QR code generation\n- **Backup Codes** for account recovery\n- **Mobile App Integration** (Google Authenticator, Authy, etc.)\n\n### Biometric Authentication\n\n- **WebAuthn/FIDO2 Support** for passwordless authentication\n- **Platform Authenticators** (Touch ID, Face ID, Windows Hello)\n- **Cross-Platform Security Keys** (USB, NFC, Bluetooth)\n- **Credential Management** with device naming and removal\n\n### Advanced Security Features\n\n- **Device Trust Management** with fingerprinting\n- **Session Timeout Warnings** with countdown timers\n- **Password Reset Flow** with email verification\n- **SSO Integration** (OAuth2, SAML, OpenID Connect)\n- **Session Management** with active session monitoring\n\n### Route Protection\n\n- **Auth Guards** for protected routes\n- **Permission-based Access Control** with role validation\n- **Conditional Rendering** based on authentication state\n- **Automatic Redirects** for unauthorized access\n\n## 📁 Architecture Overview\n\n```{$detected_lang}\nsrc/\n├── auth/ # Authentication core\n│ ├── mod.rs # Type definitions and exports\n│ ├── token_manager.rs # JWT token handling with auto-refresh\n│ ├── storage.rs # Encrypted token storage\n│ ├── webauthn.rs # WebAuthn/FIDO2 implementation\n│ ├── crypto.rs # Cryptographic utilities\n│ └── http_interceptor.rs # HTTP request/response interceptor\n├── components/auth/ # Authentication components\n│ ├── mod.rs # Component exports\n│ ├── login_form.rs # Email/password login form\n│ ├── mfa_setup.rs # TOTP MFA configuration\n│ ├── password_reset.rs # Password reset flow\n│ ├── auth_guard.rs # Route protection components\n│ ├── session_timeout.rs # Session management modal\n│ ├── sso_buttons.rs # SSO provider buttons\n│ ├── device_trust.rs # Device trust management\n│ ├── biometric_auth.rs # WebAuthn biometric auth\n│ ├── logout_button.rs # Logout functionality\n│ └── user_profile.rs # User profile management\n├── utils/ # Utility modules\n└── lib.rs # Main application entry\n```\n\n## 🚀 Implemented Components\n\nAll authentication components have been successfully implemented:\n\n### ✅ Core Authentication Infrastructure\n\n- **Secure Token Storage** (`src/auth/storage.rs`) - AES-256-GCM encrypted localStorage with session-based keys\n- **JWT Token Manager** (`src/auth/token_manager.rs`) - Automatic token refresh, expiry monitoring, context management\n- **Crypto Utilities** (`src/auth/crypto.rs`) - Secure random generation, hashing, HMAC, device fingerprinting\n- **HTTP Interceptor** (`src/auth/http_interceptor.rs`) - 401 handling, automatic logout, request/response middleware\n\n### ✅ Authentication Components\n\n- **Login Form** (`src/components/auth/login_form.rs`) - Email/password validation, remember me, SSO integration\n- **MFA Setup** (`src/components/auth/mfa_setup.rs`) - TOTP with QR codes, backup codes, verification flow\n- **Password Reset** (`src/components/auth/password_reset.rs`) - Email verification, secure token flow, validation\n- **Session Timeout** (`src/components/auth/session_timeout.rs`) - Countdown modal, automatic logout, session extension\n\n### ✅ Advanced Security Features\n\n- **Device Trust** (`src/components/auth/device_trust.rs`) - Device fingerprinting, trust management, auto-generated names\n- **Biometric Auth** (`src/components/auth/biometric_auth.rs`) - WebAuthn/FIDO2 integration, credential management\n- **SSO Buttons** (`src/components/auth/sso_buttons.rs`) - OAuth2/SAML/OIDC providers with branded icons\n- **User Profile** (`src/components/auth/user_profile.rs`) - Comprehensive profile management with tabbed interface\n\n### ✅ Route Protection System\n\n- **Auth Guard** (`src/components/auth/auth_guard.rs`) - Protected routes, permission guards, role-based access\n- **Logout Button** (`src/components/auth/logout_button.rs`) - Secure logout with server notification and cleanup\n\n### ✅ WebAuthn Integration\n\n- **WebAuthn Manager** (`src/auth/webauthn.rs`) - Complete FIDO2 implementation with browser compatibility\n- **Biometric Registration** - Platform and cross-platform authenticator support\n- **Credential Management** - Device naming, usage tracking, removal capabilities\n\n## 🔒 Security Implementation\n\n### Token Security\n\n- **AES-256-GCM Encryption**: All tokens encrypted before storage\n- **Session-based Keys**: Encryption keys unique per browser session\n- **Automatic Rotation**: Keys regenerated on each application load\n- **Secure Cleanup**: Complete token removal on logout\n\n### Device Trust\n\n- **Hardware Fingerprinting**: Based on browser, platform, screen, timezone\n- **Trust Duration**: Configurable trust periods (7, 30, 90, 365 days)\n- **Trust Tokens**: Separate tokens for device trust validation\n- **Remote Revocation**: Server-side device trust management\n\n### Session Management\n\n- **Configurable Timeouts**: Adjustable session timeout periods\n- **Activity Monitoring**: Tracks user activity for session extension\n- **Concurrent Sessions**: Multiple session tracking and management\n- **Graceful Logout**: Clean session termination with server notification\n\n### WebAuthn Security\n\n- **Hardware Security**: Leverages hardware security modules\n- **Biometric Verification**: Touch ID, Face ID, Windows Hello support\n- **Security Key Support**: USB, NFC, Bluetooth FIDO2 keys\n- **Attestation Validation**: Hardware authenticity verification\n\n## 📱 Component Usage Examples\n\n### Basic Authentication Flow\n\n```{$detected_lang}\nuse leptos::*;\nuse control_center_ui::auth::provide_auth_context;\nuse control_center_ui::components::auth::*;\n\n#[component]\nfn App() -> impl IntoView {\n provide_meta_context();\n\n // Initialize auth context with API base URL\n provide_auth_context("http://localhost:8080".to_string()).unwrap();\n\n view! {\n \n \n \n \n \n \n \n }\n}\n```\n\n### Login Page Implementation\n\n```{$detected_lang}\n#[component]\nfn LoginPage() -> impl IntoView {\n view! {\n
\n
\n

\n "Control Center"\n

\n
\n
\n
\n \n
\n
\n
\n }\n}\n```\n\n### Protected Dashboard\n\n```{$detected_lang}\n#[component]\nfn DashboardPage() -> impl IntoView {\n view! {\n \n
\n \n
\n \n // Dashboard content\n
\n
\n
\n }\n}\n```\n\n### User Profile Management\n\n```{$detected_lang}\n#[component]\nfn ProfilePage() -> impl IntoView {\n view! {\n \n
\n
\n \n
\n
\n
\n }\n}\n```\n\n## 🔧 Required Backend API\n\nThe authentication system expects the following backend endpoints:\n\n### Authentication Endpoints\n\n```{$detected_lang}\nPOST /auth/login # Email/password authentication\nPOST /auth/refresh # JWT token refresh\nPOST /auth/logout # Session termination\nPOST /auth/extend-session # Session timeout extension\n```\n\n### Password Management\n\n```{$detected_lang}\nPOST /auth/password-reset # Password reset request\nPOST /auth/password-reset/confirm # Password reset confirmation\n```\n\n### Multi-Factor Authentication\n\n```{$detected_lang}\nPOST /auth/mfa/setup # MFA setup initiation\nPOST /auth/mfa/verify # MFA verification\n```\n\n### SSO Integration\n\n```{$detected_lang}\nGET /auth/sso/providers # Available SSO providers\nPOST /auth/sso/{provider}/login # SSO authentication initiation\n```\n\n### WebAuthn/FIDO2\n\n```{$detected_lang}\nPOST /auth/webauthn/register/begin # WebAuthn registration start\nPOST /auth/webauthn/register/complete # WebAuthn registration finish\nPOST /auth/webauthn/authenticate/begin # WebAuthn authentication start\nPOST /auth/webauthn/authenticate/complete # WebAuthn authentication finish\nGET /auth/webauthn/credentials # List WebAuthn credentials\nDELETE /auth/webauthn/credentials/{id} # Remove WebAuthn credential\n```\n\n### Device Trust Management\n\n```{$detected_lang}\nGET /auth/devices # List trusted devices\nPOST /auth/devices/trust # Trust current device\nDELETE /auth/devices/{id}/revoke # Revoke device trust\n```\n\n### User Profile Management\n\n```{$detected_lang}\nGET /user/profile # Get user profile\nPUT /user/profile # Update user profile\nPOST /user/change-password # Change password\nPOST /user/mfa/enable # Enable MFA\nPOST /user/mfa/disable # Disable MFA\nGET /user/sessions # List active sessions\nDELETE /user/sessions/{id}/revoke # Revoke session\n```\n\n## 📊 Implementation Statistics\n\n### Component Coverage\n\n- **13/13 Core Components** ✅ Complete\n- **4/4 Auth Infrastructure** ✅ Complete\n- **9/9 Security Features** ✅ Complete\n- **3/3 Route Protection** ✅ Complete\n- **2/2 WebAuthn Features** ✅ Complete\n\n### Security Features\n\n- **Encrypted Storage** ✅ AES-256-GCM with session keys\n- **Automatic Token Refresh** ✅ Background refresh with retry logic\n- **Device Fingerprinting** ✅ Hardware-based unique identification\n- **Session Management** ✅ Timeout warnings and extensions\n- **Biometric Authentication** ✅ WebAuthn/FIDO2 integration\n- **Multi-Factor Auth** ✅ TOTP with QR codes and backup codes\n- **SSO Integration** ✅ OAuth2/SAML/OIDC providers\n- **Route Protection** ✅ Guards with permission/role validation\n\n### Performance Optimizations\n\n- **Lazy Loading** ✅ Components loaded on demand\n- **Reactive Updates** ✅ Leptos fine-grained reactivity\n- **Efficient Re-renders** ✅ Minimal component updates\n- **Background Operations** ✅ Non-blocking authentication flows\n- **Connection Management** ✅ Automatic retry and fallback\n\n## 🎯 Key Features Highlights\n\n### Advanced Authentication\n\n- **Passwordless Login**: WebAuthn biometric authentication\n- **Device Memory**: Skip MFA on trusted devices\n- **Session Continuity**: Automatic token refresh without interruption\n- **Multi-Provider SSO**: Google, Microsoft, GitHub, GitLab, etc.\n\n### Enterprise Security\n\n- **Hardware Security**: FIDO2 security keys and platform authenticators\n- **Device Trust**: Configurable trust periods with remote revocation\n- **Session Monitoring**: Real-time session management and monitoring\n- **Audit Trail**: Complete authentication event logging\n\n### Developer Experience\n\n- **Type Safety**: Full TypeScript-equivalent safety with Rust\n- **Component Reusability**: Modular authentication components\n- **Easy Integration**: Simple context provider setup\n- **Comprehensive Documentation**: Detailed implementation guide\n\n### User Experience\n\n- **Smooth Flows**: Intuitive authentication workflows\n- **Mobile Support**: Responsive design for all devices\n- **Accessibility**: WCAG 2.1 compliant components\n- **Error Handling**: User-friendly error messages and recovery\n\n## 🚀 Getting Started\n\n### Prerequisites\n\n- **Rust 1.70+** with wasm-pack\n- **Leptos 0.6** framework\n- **Compatible browser** (Chrome 67+, Firefox 60+, Safari 14+, Edge 18+)\n\n### Quick Setup\n\n1. Add the authentication dependencies to your `Cargo.toml`\n2. Initialize the authentication context in your app\n3. Use the provided components in your routes\n4. Configure your backend API endpoints\n5. Test the complete authentication flow\n\n### Production Deployment\n\n- **HTTPS Required**: WebAuthn requires secure connections\n- **CORS Configuration**: Proper cross-origin setup\n- **CSP Headers**: Content security policy for XSS protection\n- **Rate Limiting**: API endpoint protection\n\n---\n\n**A complete, production-ready authentication system built with modern Rust and WebAssembly technologies.** +# Control Center UI - Leptos Authentication System + +A comprehensive authentication system built with Leptos and WebAssembly for cloud infrastructure management. + +## 🔐 Features Overview + +### Core Authentication + +- **Email/Password Login** with comprehensive validation +- **JWT Token Management** with automatic refresh +- **Secure Token Storage** with AES-256-GCM encryption in localStorage +- **401 Response Interceptor** for automatic logout and token refresh + +### Multi-Factor Authentication (MFA) + +- **TOTP-based MFA** with QR code generation +- **Backup Codes** for account recovery +- **Mobile App Integration** (Google Authenticator, Authy, etc.) + +### Biometric Authentication + +- **WebAuthn/FIDO2 Support** for passwordless authentication +- **Platform Authenticators** (Touch ID, Face ID, Windows Hello) +- **Cross-Platform Security Keys** (USB, NFC, Bluetooth) +- **Credential Management** with device naming and removal + +### Advanced Security Features + +- **Device Trust Management** with fingerprinting +- **Session Timeout Warnings** with countdown timers +- **Password Reset Flow** with email verification +- **SSO Integration** (OAuth2, SAML, OpenID Connect) +- **Session Management** with active session monitoring + +### Route Protection + +- **Auth Guards** for protected routes +- **Permission-based Access Control** with role validation +- **Conditional Rendering** based on authentication state +- **Automatic Redirects** for unauthorized access + +## 📁 Architecture Overview + +```bash +src/ +├── auth/ # Authentication core +│ ├── mod.rs # Type definitions and exports +│ ├── token_manager.rs # JWT token handling with auto-refresh +│ ├── storage.rs # Encrypted token storage +│ ├── webauthn.rs # WebAuthn/FIDO2 implementation +│ ├── crypto.rs # Cryptographic utilities +│ └── http_interceptor.rs # HTTP request/response interceptor +├── components/auth/ # Authentication components +│ ├── mod.rs # Component exports +│ ├── login_form.rs # Email/password login form +│ ├── mfa_setup.rs # TOTP MFA configuration +│ ├── password_reset.rs # Password reset flow +│ ├── auth_guard.rs # Route protection components +│ ├── session_timeout.rs # Session management modal +│ ├── sso_buttons.rs # SSO provider buttons +│ ├── device_trust.rs # Device trust management +│ ├── biometric_auth.rs # WebAuthn biometric auth +│ ├── logout_button.rs # Logout functionality +│ └── user_profile.rs # User profile management +├── utils/ # Utility modules +└── lib.rs # Main application entry +``` + +## 🚀 Implemented Components + +All authentication components have been successfully implemented: + +### ✅ Core Authentication Infrastructure + +- **Secure Token Storage** (`src/auth/storage.rs`) - AES-256-GCM encrypted localStorage with session-based keys +- **JWT Token Manager** (`src/auth/token_manager.rs`) - Automatic token refresh, expiry monitoring, context management +- **Crypto Utilities** (`src/auth/crypto.rs`) - Secure random generation, hashing, HMAC, device fingerprinting +- **HTTP Interceptor** (`src/auth/http_interceptor.rs`) - 401 handling, automatic logout, request/response middleware + +### ✅ Authentication Components + +- **Login Form** (`src/components/auth/login_form.rs`) - Email/password validation, remember me, SSO integration +- **MFA Setup** (`src/components/auth/mfa_setup.rs`) - TOTP with QR codes, backup codes, verification flow +- **Password Reset** (`src/components/auth/password_reset.rs`) - Email verification, secure token flow, validation +- **Session Timeout** (`src/components/auth/session_timeout.rs`) - Countdown modal, automatic logout, session extension + +### ✅ Advanced Security Features + +- **Device Trust** (`src/components/auth/device_trust.rs`) - Device fingerprinting, trust management, auto-generated names +- **Biometric Auth** (`src/components/auth/biometric_auth.rs`) - WebAuthn/FIDO2 integration, credential management +- **SSO Buttons** (`src/components/auth/sso_buttons.rs`) - OAuth2/SAML/OIDC providers with branded icons +- **User Profile** (`src/components/auth/user_profile.rs`) - Comprehensive profile management with tabbed interface + +### ✅ Route Protection System + +- **Auth Guard** (`src/components/auth/auth_guard.rs`) - Protected routes, permission guards, role-based access +- **Logout Button** (`src/components/auth/logout_button.rs`) - Secure logout with server notification and cleanup + +### ✅ WebAuthn Integration + +- **WebAuthn Manager** (`src/auth/webauthn.rs`) - Complete FIDO2 implementation with browser compatibility +- **Biometric Registration** - Platform and cross-platform authenticator support +- **Credential Management** - Device naming, usage tracking, removal capabilities + +## 🔒 Security Implementation + +### Token Security + +- **AES-256-GCM Encryption**: All tokens encrypted before storage +- **Session-based Keys**: Encryption keys unique per browser session +- **Automatic Rotation**: Keys regenerated on each application load +- **Secure Cleanup**: Complete token removal on logout + +### Device Trust + +- **Hardware Fingerprinting**: Based on browser, platform, screen, timezone +- **Trust Duration**: Configurable trust periods (7, 30, 90, 365 days) +- **Trust Tokens**: Separate tokens for device trust validation +- **Remote Revocation**: Server-side device trust management + +### Session Management + +- **Configurable Timeouts**: Adjustable session timeout periods +- **Activity Monitoring**: Tracks user activity for session extension +- **Concurrent Sessions**: Multiple session tracking and management +- **Graceful Logout**: Clean session termination with server notification + +### WebAuthn Security + +- **Hardware Security**: Leverages hardware security modules +- **Biometric Verification**: Touch ID, Face ID, Windows Hello support +- **Security Key Support**: USB, NFC, Bluetooth FIDO2 keys +- **Attestation Validation**: Hardware authenticity verification + +## 📱 Component Usage Examples + +### Basic Authentication Flow + +```bash +use leptos::*; +use control_center_ui::auth::provide_auth_context; +use control_center_ui::components::auth::*; + +#[component] +fn App() -> impl IntoView { + provide_meta_context(); + + // Initialize auth context with API base URL + provide_auth_context("http://localhost:8080".to_string()).unwrap(); + + view! { + + + + + + + + } +} +``` + +### Login Page Implementation + +```bash +#[component] +fn LoginPage() -> impl IntoView { + view! { +
+
+

+ "Control Center" +

+
+
+
+ +
+
+
+ } +} +``` + +### Protected Dashboard + +```bash +#[component] +fn DashboardPage() -> impl IntoView { + view! { + +
+ +
+ + // Dashboard content +
+
+
+ } +} +``` + +### User Profile Management + +```bash +#[component] +fn ProfilePage() -> impl IntoView { + view! { + +
+
+ +
+
+
+ } +} +``` + +## 🔧 Required Backend API + +The authentication system expects the following backend endpoints: + +### Authentication Endpoints + +```bash +POST /auth/login # Email/password authentication +POST /auth/refresh # JWT token refresh +POST /auth/logout # Session termination +POST /auth/extend-session # Session timeout extension +``` + +### Password Management + +```bash +POST /auth/password-reset # Password reset request +POST /auth/password-reset/confirm # Password reset confirmation +``` + +### Multi-Factor Authentication + +```bash +POST /auth/mfa/setup # MFA setup initiation +POST /auth/mfa/verify # MFA verification +``` + +### SSO Integration + +```bash +GET /auth/sso/providers # Available SSO providers +POST /auth/sso/{provider}/login # SSO authentication initiation +``` + +### WebAuthn/FIDO2 + +```bash +POST /auth/webauthn/register/begin # WebAuthn registration start +POST /auth/webauthn/register/complete # WebAuthn registration finish +POST /auth/webauthn/authenticate/begin # WebAuthn authentication start +POST /auth/webauthn/authenticate/complete # WebAuthn authentication finish +GET /auth/webauthn/credentials # List WebAuthn credentials +DELETE /auth/webauthn/credentials/{id} # Remove WebAuthn credential +``` + +### Device Trust Management + +```rust +GET /auth/devices # List trusted devices +POST /auth/devices/trust # Trust current device +DELETE /auth/devices/{id}/revoke # Revoke device trust +``` + +### User Profile Management + +```bash +GET /user/profile # Get user profile +PUT /user/profile # Update user profile +POST /user/change-password # Change password +POST /user/mfa/enable # Enable MFA +POST /user/mfa/disable # Disable MFA +GET /user/sessions # List active sessions +DELETE /user/sessions/{id}/revoke # Revoke session +``` + +## 📊 Implementation Statistics + +### Component Coverage + +- **13/13 Core Components** ✅ Complete +- **4/4 Auth Infrastructure** ✅ Complete +- **9/9 Security Features** ✅ Complete +- **3/3 Route Protection** ✅ Complete +- **2/2 WebAuthn Features** ✅ Complete + +### Security Features + +- **Encrypted Storage** ✅ AES-256-GCM with session keys +- **Automatic Token Refresh** ✅ Background refresh with retry logic +- **Device Fingerprinting** ✅ Hardware-based unique identification +- **Session Management** ✅ Timeout warnings and extensions +- **Biometric Authentication** ✅ WebAuthn/FIDO2 integration +- **Multi-Factor Auth** ✅ TOTP with QR codes and backup codes +- **SSO Integration** ✅ OAuth2/SAML/OIDC providers +- **Route Protection** ✅ Guards with permission/role validation + +### Performance Optimizations + +- **Lazy Loading** ✅ Components loaded on demand +- **Reactive Updates** ✅ Leptos fine-grained reactivity +- **Efficient Re-renders** ✅ Minimal component updates +- **Background Operations** ✅ Non-blocking authentication flows +- **Connection Management** ✅ Automatic retry and fallback + +## 🎯 Key Features Highlights + +### Advanced Authentication + +- **Passwordless Login**: WebAuthn biometric authentication +- **Device Memory**: Skip MFA on trusted devices +- **Session Continuity**: Automatic token refresh without interruption +- **Multi-Provider SSO**: Google, Microsoft, GitHub, GitLab, etc. + +### Enterprise Security + +- **Hardware Security**: FIDO2 security keys and platform authenticators +- **Device Trust**: Configurable trust periods with remote revocation +- **Session Monitoring**: Real-time session management and monitoring +- **Audit Trail**: Complete authentication event logging + +### Developer Experience + +- **Type Safety**: Full TypeScript-equivalent safety with Rust +- **Component Reusability**: Modular authentication components +- **Easy Integration**: Simple context provider setup +- **Comprehensive Documentation**: Detailed implementation guide + +### User Experience + +- **Smooth Flows**: Intuitive authentication workflows +- **Mobile Support**: Responsive design for all devices +- **Accessibility**: WCAG 2.1 compliant components +- **Error Handling**: User-friendly error messages and recovery + +## 🚀 Getting Started + +### Prerequisites + +- **Rust 1.70+** with wasm-pack +- **Leptos 0.6** framework +- **Compatible browser** (Chrome 67+, Firefox 60+, Safari 14+, Edge 18+) + +### Quick Setup + +1. Add the authentication dependencies to your `Cargo.toml` +2. Initialize the authentication context in your app +3. Use the provided components in your routes +4. Configure your backend API endpoints +5. Test the complete authentication flow + +### Production Deployment + +- **HTTPS Required**: WebAuthn requires secure connections +- **CORS Configuration**: Proper cross-origin setup +- **CSP Headers**: Content security policy for XSS protection +- **Rate Limiting**: API endpoint protection + +--- + +**A complete, production-ready authentication system built with modern Rust and WebAssembly technologies.** \ No newline at end of file diff --git a/crates/control-center-ui/upstream-dependency-issue.md b/crates/control-center-ui/upstream-dependency-issue.md index be0e963..49fdd96 100644 --- a/crates/control-center-ui/upstream-dependency-issue.md +++ b/crates/control-center-ui/upstream-dependency-issue.md @@ -1 +1,145 @@ -# Upstream Dependency Issue: num-bigint-dig v0.8.4\n\n## Issue Summary\n\n**Status**: ⚠️ **UPSTREAM ISSUE - NON-BLOCKING**\n\nThe control-center-ui build produces a future incompatibility warning from the transitive dependency `num-bigint-dig v0.8.4`:\n\n```{$detected_lang}\nwarning: the following packages contain code that will be rejected by a future version of Rust: num-bigint-dig v0.8.4\nnote: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 1`\n```\n\n## Root Cause\n\nThe `num-bigint-dig v0.8.4` crate uses a **private `vec!` macro** in multiple locations (Rust issue #120192).\nThis pattern will become a hard error in a future Rust release.\n\n**Affected files in num-bigint-dig v0.8.4:**\n\n- `src/biguint.rs` (lines 490, 2005, 2027, 2313)\n- `src/prime.rs` (line 138)\n- `src/bigrand.rs` (line 319)\n\n## Dependency Chain\n\n```{$detected_lang}\ncontrol-center-ui (control-center-ui v0.1.0)\n ↓\nnum-bigint-dig v0.8.4\n ↑ (pulled in by)\n├── rsa v0.9.9\n│ ├── control-center\n│ ├── jsonwebtoken v10.2.0\n│ └── provisioning-orchestrator\n└── ssh-key v0.6.7\n ├── russh v0.44.1\n └── russh-keys v0.44.0\n```\n\n## Why We Can't Fix It\n\n**Option 1: Direct Patch**\n\n- ✗ Cannot patch transitive crates.io dependencies to different crates.io versions\n- Cargo only allows patches to point to different sources (git repos, local paths)\n\n**Option 2: Upgrade rsa**\n\n- Available: `rsa v0.10.0-rc.10` (release candidate only, not stable)\n- Status: Not production-ready until stable release\n- Current: `rsa v0.9.9` (stable, production)\n\n**Option 3: Upgrade ssh-key**\n\n- Current: `ssh-key v0.6.7`\n- Still depends on `num-bigint-dig v0.8.4` (not upgraded yet)\n\n**Option 4: Local Fork**\n\n- ✗ Not practical for transitive dependencies\n\n## Resolution Timeline\n\n**For num-bigint-dig:**\n\n- Available versions: 0.8.5, 0.8.6, 0.9.0, 0.9.1\n- Latest: v0.9.1\n- Status: Fixed in 0.8.6 and later\n- When it gets picked up: Depends on upstream crate releases\n\n**Upstream Action Items:**\n\n1. **rsa crate** needs to upgrade to use newer num-bigint-dig when available\n2. **ssh-key crate** needs to upgrade to use newer num-bigint-dig when available\n3. Once upstream crates update their dependencies, our Cargo.lock will automatically use the fixed version\n\n## Current Impact\n\n✅ **NO IMPACT ON FUNCTIONALITY**\n\n- Code compiles cleanly\n- All tests pass\n- All features work correctly\n- Only a forward-compatibility warning, not an error\n\n✅ **NOT A BLOCKER FOR:**\n\n- Deployment\n- Production use\n- Any functionality\n- WASM compilation\n- Release builds\n\n## Timeline for Resolution\n\n| Status | Item | Estimated |\n| -------- | ------ | ----------- |\n| ✓ Available | num-bigint-dig 0.8.6 | Already released |\n| ⏳ Waiting | rsa v0.10 stable release | 2024-Q4 to 2025-Q1 |\n| ⏳ Waiting | Downstream crate updates | After upstream releases |\n| ✓ Automatic | Our build updates | Once dependencies are updated |\n\n## Monitoring\n\nTo check for updates:\n\n```{$detected_lang}\n# Check for future incompatibilities\ncargo report future-incompatibilities\n\n# Check available versions\ncargo outdated\n\n# Check dependency tree\ncargo tree | grep num-bigint-dig\n```\n\n## Workaround (if needed)\n\nIf the warning becomes an error before upstream fixes are released, you can:\n\n1. **Use an older Rust version** (current stable still allows this as warning)\n2. **Wait for upstream updates** (recommended)\n3. **Create a fork** of rsa/ssh-key with newer num-bigint-dig (not recommended)\n\n## Recommended Action\n\n**No immediate action needed.** This is a normal part of the Rust ecosystem evolution:\n\n- Upstream packages will update their dependencies\n- Our Cargo.lock will automatically resolve to fixed versions\n- Continue monitoring with `cargo report future-incompatibilities`\n\n## References\n\n- Rust Issue #120192: \n- num-bigint-dig Repository: \n- num-bigint-dig Releases: \n\n---\n\n**Last Updated**: December 12, 2025\n**Status**: Monitored, Non-Blocking\n**Action**: Awaiting Upstream Fixes +# Upstream Dependency Issue: num-bigint-dig v0.8.4 + +## Issue Summary + +**Status**: ⚠️ **UPSTREAM ISSUE - NON-BLOCKING** + +The control-center-ui build produces a future incompatibility warning from the transitive dependency `num-bigint-dig v0.8.4`: + +```bash +warning: the following packages contain code that will be rejected by a future version of Rust: num-bigint-dig v0.8.4 +note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 1` +``` + +## Root Cause + +The `num-bigint-dig v0.8.4` crate uses a **private `vec!` macro** in multiple locations (Rust issue #120192). +This pattern will become a hard error in a future Rust release. + +**Affected files in num-bigint-dig v0.8.4:** + +- `src/biguint.rs` (lines 490, 2005, 2027, 2313) +- `src/prime.rs` (line 138) +- `src/bigrand.rs` (line 319) + +## Dependency Chain + +```bash +control-center-ui (control-center-ui v0.1.0) + ↓ +num-bigint-dig v0.8.4 + ↑ (pulled in by) +├── rsa v0.9.9 +│ ├── control-center +│ ├── jsonwebtoken v10.2.0 +│ └── provisioning-orchestrator +└── ssh-key v0.6.7 + ├── russh v0.44.1 + └── russh-keys v0.44.0 +``` + +## Why We Can't Fix It + +**Option 1: Direct Patch** + +- ✗ Cannot patch transitive crates.io dependencies to different crates.io versions +- Cargo only allows patches to point to different sources (git repos, local paths) + +**Option 2: Upgrade rsa** + +- Available: `rsa v0.10.0-rc.10` (release candidate only, not stable) +- Status: Not production-ready until stable release +- Current: `rsa v0.9.9` (stable, production) + +**Option 3: Upgrade ssh-key** + +- Current: `ssh-key v0.6.7` +- Still depends on `num-bigint-dig v0.8.4` (not upgraded yet) + +**Option 4: Local Fork** + +- ✗ Not practical for transitive dependencies + +## Resolution Timeline + +**For num-bigint-dig:** + +- Available versions: 0.8.5, 0.8.6, 0.9.0, 0.9.1 +- Latest: v0.9.1 +- Status: Fixed in 0.8.6 and later +- When it gets picked up: Depends on upstream crate releases + +**Upstream Action Items:** + +1. **rsa crate** needs to upgrade to use newer num-bigint-dig when available +2. **ssh-key crate** needs to upgrade to use newer num-bigint-dig when available +3. Once upstream crates update their dependencies, our Cargo.lock will automatically use the fixed version + +## Current Impact + +✅ **NO IMPACT ON FUNCTIONALITY** + +- Code compiles cleanly +- All tests pass +- All features work correctly +- Only a forward-compatibility warning, not an error + +✅ **NOT A BLOCKER FOR:** + +- Deployment +- Production use +- Any functionality +- WASM compilation +- Release builds + +## Timeline for Resolution + +| Status | Item | Estimated | +| -------- | ------ | ----------- | +| ✓ Available | num-bigint-dig 0.8.6 | Already released | +| ⏳ Waiting | rsa v0.10 stable release | 2024-Q4 to 2025-Q1 | +| ⏳ Waiting | Downstream crate updates | After upstream releases | +| ✓ Automatic | Our build updates | Once dependencies are updated | + +## Monitoring + +To check for updates: + +```bash +# Check for future incompatibilities +cargo report future-incompatibilities + +# Check available versions +cargo outdated + +# Check dependency tree +cargo tree | grep num-bigint-dig +``` + +## Workaround (if needed) + +If the warning becomes an error before upstream fixes are released, you can: + +1. **Use an older Rust version** (current stable still allows this as warning) +2. **Wait for upstream updates** (recommended) +3. **Create a fork** of rsa/ssh-key with newer num-bigint-dig (not recommended) + +## Recommended Action + +**No immediate action needed.** This is a normal part of the Rust ecosystem evolution: + +- Upstream packages will update their dependencies +- Our Cargo.lock will automatically resolve to fixed versions +- Continue monitoring with `cargo report future-incompatibilities` + +## References + +- Rust Issue #120192: +- num-bigint-dig Repository: +- num-bigint-dig Releases: + +--- + +**Last Updated**: December 12, 2025 +**Status**: Monitored, Non-Blocking +**Action**: Awaiting Upstream Fixes \ No newline at end of file diff --git a/crates/control-center/README.md b/crates/control-center/README.md index 2ee9ba7..65a44ec 100644 --- a/crates/control-center/README.md +++ b/crates/control-center/README.md @@ -1 +1,371 @@ -# Control Center - Cedar Policy Engine\n\nA comprehensive Cedar policy engine implementation with advanced security features, compliance checking, and anomaly detection.\n\n## Features\n\n### 🔐 Cedar Policy Engine\n\n- **Policy Evaluation**: High-performance policy evaluation with context injection\n- **Versioning**: Complete policy versioning with rollback capabilities\n- **Templates**: Configuration-driven policy templates with variable substitution\n- **Validation**: Comprehensive policy validation with syntax and semantic checking\n\n### 🛡️ Security & Authentication\n\n- **JWT Authentication**: Secure token-based authentication\n- **Multi-Factor Authentication**: MFA support for sensitive operations\n- **Role-Based Access Control**: Flexible RBAC with policy integration\n- **Session Management**: Secure session handling with timeouts\n\n### 📊 Compliance Framework\n\n- **SOC2 Type II**: Complete SOC2 compliance validation\n- **HIPAA**: Healthcare data protection compliance\n- **Audit Trail**: Comprehensive audit logging and reporting\n- **Impact Analysis**: Policy change impact assessment\n\n### 🔍 Anomaly Detection\n\n- **Statistical Analysis**: Multiple statistical methods (Z-Score, IQR, Isolation Forest)\n- **Real-time Detection**: Continuous monitoring of policy evaluations\n- **Alert Management**: Configurable alerting through multiple channels\n- **Baseline Learning**: Adaptive baseline calculation for improved accuracy\n\n### 🗄️ Storage & Persistence\n\n- **SurrealDB Integration**: High-performance graph database backend\n- **Policy Storage**: Versioned policy storage with metadata\n- **Metrics Storage**: Policy evaluation metrics and analytics\n- **Compliance Records**: Complete compliance audit trails\n\n## Quick Start\n\n### 1. Installation\n\n```{$detected_lang}\ncd src/control-center\ncargo build --release\n```\n\n### 2. Configuration\n\nCopy the example configuration:\n\n```{$detected_lang}\ncp config.toml.example config.toml\n```\n\nEdit `config.toml` for your environment:\n\n```{$detected_lang}\n[database]\nurl = "surreal://localhost:8000" # Your SurrealDB instance\nusername = "root"\npassword = "your-password"\n\n[auth]\njwt_secret = "your-super-secret-key"\nrequire_mfa = true\n\n[compliance.soc2]\nenabled = true\n\n[anomaly]\nenabled = true\ndetection_threshold = 2.5\n```\n\n### 3. Start the Server\n\n```{$detected_lang}\n./target/release/control-center server --port 8080\n```\n\n### 4. Test Policy Evaluation\n\n```{$detected_lang}\ncurl -X POST http://localhost:8080/policies/evaluate \\n -H "Content-Type: application/json" \\n -d '{\n "principal": {"id": "user123", "roles": ["Developer"]},\n "action": {"id": "access"},\n "resource": {"id": "sensitive-db", "classification": "confidential"},\n "context": {"mfa_enabled": true, "location": "US"}\n }'\n```\n\n## Policy Examples\n\n### Multi-Factor Authentication Policy\n\n```{$detected_lang}\n// Require MFA for sensitive resources\npermit(\n principal,\n action == Action::"access",\n resource\n) when {\n resource has classification &&\n resource.classification in ["sensitive", "confidential"] &&\n principal has mfa_enabled &&\n principal.mfa_enabled == true\n};\n```\n\n### Production Approval Policy\n\n```{$detected_lang}\n// Require approval for production operations\npermit(\n principal,\n action in [Action::"deploy", Action::"modify", Action::"delete"],\n resource\n) when {\n resource has environment &&\n resource.environment == "production" &&\n principal has approval &&\n principal.approval.approved_by in ["ProductionAdmin", "SRE"]\n};\n```\n\n### Geographic Restrictions\n\n```{$detected_lang}\n// Allow access only from approved countries\npermit(\n principal,\n action,\n resource\n) when {\n context has geo &&\n context.geo has country &&\n context.geo.country in ["US", "CA", "GB", "DE"]\n};\n```\n\n## CLI Commands\n\n### Policy Management\n\n```{$detected_lang}\n# Validate policies\ncontrol-center policy validate policies/\n\n# Test policy with test data\ncontrol-center policy test policies/mfa.cedar tests/data/mfa_test.json\n\n# Analyze policy impact\ncontrol-center policy impact policies/new_policy.cedar\n```\n\n### Compliance Checking\n\n```{$detected_lang}\n# Check SOC2 compliance\ncontrol-center compliance soc2\n\n# Check HIPAA compliance\ncontrol-center compliance hipaa\n\n# Generate compliance report\ncontrol-center compliance report --format html\n```\n\n## API Endpoints\n\n### Policy Evaluation\n\n- `POST /policies/evaluate` - Evaluate policy decision\n- `GET /policies` - List all policies\n- `POST /policies` - Create new policy\n- `PUT /policies/{id}` - Update policy\n- `DELETE /policies/{id}` - Delete policy\n\n### Policy Versions\n\n- `GET /policies/{id}/versions` - List policy versions\n- `GET /policies/{id}/versions/{version}` - Get specific version\n- `POST /policies/{id}/rollback/{version}` - Rollback to version\n\n### Compliance\n\n- `GET /compliance/soc2` - SOC2 compliance check\n- `GET /compliance/hipaa` - HIPAA compliance check\n- `GET /compliance/report` - Generate compliance report\n\n### Anomaly Detection\n\n- `GET /anomalies` - List detected anomalies\n- `GET /anomalies/{id}` - Get anomaly details\n- `POST /anomalies/detect` - Trigger anomaly detection\n\n## Testing\n\n### Run Unit Tests\n\n```{$detected_lang}\ncargo test\n```\n\n### Run Integration Tests\n\n```{$detected_lang}\ncargo test --test integration_tests\n```\n\n### Run Policy Tests\n\n```{$detected_lang}\ncargo test --test policy_tests\n```\n\n### Run Compliance Tests\n\n```{$detected_lang}\ncargo test --test compliance_tests\n```\n\n## Architecture\n\n### Core Components\n\n1. **Policy Engine** (`src/policies/engine.rs`)\n - Cedar policy evaluation\n - Context injection\n - Caching and optimization\n\n2. **Storage Layer** (`src/storage/`)\n - SurrealDB integration\n - Policy versioning\n - Metrics storage\n\n3. **Compliance Framework** (`src/compliance/`)\n - SOC2 checker\n - HIPAA validator\n - Report generation\n\n4. **Anomaly Detection** (`src/anomaly/`)\n - Statistical analysis\n - Real-time monitoring\n - Alert management\n\n5. **Authentication** (`src/auth.rs`)\n - JWT token management\n - Password hashing\n - Session handling\n\n### Configuration-Driven Design\n\nThe system follows PAP (Project Architecture Principles) with:\n\n- **No hardcoded values**: All behavior controlled via configuration\n- **Dynamic loading**: Policies and rules loaded from configuration\n- **Template-based**: Policy generation through templates\n- **Environment-aware**: Different configs for dev/test/prod\n\n### Security Features\n\n- **Audit Logging**: All policy evaluations logged\n- **Encryption**: Data encrypted at rest and in transit\n- **Rate Limiting**: Protection against abuse\n- **Input Validation**: Comprehensive validation of all inputs\n- **Error Handling**: Secure error handling without information leakage\n\n## Production Deployment\n\n### Docker\n\n```{$detected_lang}\nFROM rust:1.75 as builder\nWORKDIR /app\nCOPY . .\nRUN cargo build --release\n\nFROM debian:bookworm-slim\nRUN apt-get update && apt-get install -y ca-certificates\nCOPY --from=builder /app/target/release/control-center /usr/local/bin/\nEXPOSE 8080\nCMD ["control-center", "server"]\n```\n\n### Kubernetes\n\n```{$detected_lang}\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: control-center\nspec:\n replicas: 3\n selector:\n matchLabels:\n app: control-center\n template:\n metadata:\n labels:\n app: control-center\n spec:\n containers:\n - name: control-center\n image: control-center:latest\n ports:\n - containerPort: 8080\n env:\n - name: DATABASE_URL\n value: "surreal://surrealdb:8000"\n```\n\n### Environment Variables\n\n```{$detected_lang}\n# Override config values with environment variables\nexport CONTROL_CENTER_SERVER_PORT=8080\nexport CONTROL_CENTER_DATABASE_URL="surreal://prod-db:8000"\nexport CONTROL_CENTER_AUTH_JWT_SECRET="production-secret"\nexport CONTROL_CENTER_COMPLIANCE_SOC2_ENABLED=true\n```\n\n## Monitoring & Observability\n\n### Metrics\n\n- Policy evaluation latency\n- Policy decision distribution\n- Anomaly detection rates\n- Compliance scores\n\n### Logging\n\n```{$detected_lang}\n// Structured logging with tracing\ntracing::info!(\n policy_id = %policy.id,\n principal = %context.principal.id,\n decision = ?result.decision,\n duration_ms = evaluation_time,\n "Policy evaluation completed"\n);\n```\n\n### Health Checks\n\n```{$detected_lang}\ncurl http://localhost:8080/health\n```\n\n## Contributing\n\n1. Follow the PAP principles documented in the codebase\n2. Add tests for new features\n3. Update documentation\n4. Ensure compliance checks pass\n5. Add appropriate logging and monitoring\n\n## License\n\nThis project follows the licensing specified in the parent repository.\n\n## Support\n\nFor questions and support, refer to the project documentation or create an issue in the repository. +# Control Center - Cedar Policy Engine + +A comprehensive Cedar policy engine implementation with advanced security features, compliance checking, and anomaly detection. + +## Features + +### 🔐 Cedar Policy Engine + +- **Policy Evaluation**: High-performance policy evaluation with context injection +- **Versioning**: Complete policy versioning with rollback capabilities +- **Templates**: Configuration-driven policy templates with variable substitution +- **Validation**: Comprehensive policy validation with syntax and semantic checking + +### 🛡️ Security & Authentication + +- **JWT Authentication**: Secure token-based authentication +- **Multi-Factor Authentication**: MFA support for sensitive operations +- **Role-Based Access Control**: Flexible RBAC with policy integration +- **Session Management**: Secure session handling with timeouts + +### 📊 Compliance Framework + +- **SOC2 Type II**: Complete SOC2 compliance validation +- **HIPAA**: Healthcare data protection compliance +- **Audit Trail**: Comprehensive audit logging and reporting +- **Impact Analysis**: Policy change impact assessment + +### 🔍 Anomaly Detection + +- **Statistical Analysis**: Multiple statistical methods (Z-Score, IQR, Isolation Forest) +- **Real-time Detection**: Continuous monitoring of policy evaluations +- **Alert Management**: Configurable alerting through multiple channels +- **Baseline Learning**: Adaptive baseline calculation for improved accuracy + +### 🗄️ Storage & Persistence + +- **SurrealDB Integration**: High-performance graph database backend +- **Policy Storage**: Versioned policy storage with metadata +- **Metrics Storage**: Policy evaluation metrics and analytics +- **Compliance Records**: Complete compliance audit trails + +## Quick Start + +### 1. Installation + +```bash +cd src/control-center +cargo build --release +``` + +### 2. Configuration + +Copy the example configuration: + +```toml +cp config.toml.example config.toml +``` + +Edit `config.toml` for your environment: + +```toml +[database] +url = "surreal://localhost:8000" # Your SurrealDB instance +username = "root" +password = "your-password" + +[auth] +jwt_secret = "your-super-secret-key" +require_mfa = true + +[compliance.soc2] +enabled = true + +[anomaly] +enabled = true +detection_threshold = 2.5 +``` + +### 3. Start the Server + +```bash +./target/release/control-center server --port 8080 +``` + +### 4. Test Policy Evaluation + +```bash +curl -X POST http://localhost:8080/policies/evaluate + -H "Content-Type: application/json" + -d '{ + "principal": {"id": "user123", "roles": ["Developer"]}, + "action": {"id": "access"}, + "resource": {"id": "sensitive-db", "classification": "confidential"}, + "context": {"mfa_enabled": true, "location": "US"} + }' +``` + +## Policy Examples + +### Multi-Factor Authentication Policy + +```bash +// Require MFA for sensitive resources +permit( + principal, + action == Action::"access", + resource +) when { + resource has classification && + resource.classification in ["sensitive", "confidential"] && + principal has mfa_enabled && + principal.mfa_enabled == true +}; +``` + +### Production Approval Policy + +```bash +// Require approval for production operations +permit( + principal, + action in [Action::"deploy", Action::"modify", Action::"delete"], + resource +) when { + resource has environment && + resource.environment == "production" && + principal has approval && + principal.approval.approved_by in ["ProductionAdmin", "SRE"] +}; +``` + +### Geographic Restrictions + +```bash +// Allow access only from approved countries +permit( + principal, + action, + resource +) when { + context has geo && + context.geo has country && + context.geo.country in ["US", "CA", "GB", "DE"] +}; +``` + +## CLI Commands + +### Policy Management + +```bash +# Validate policies +control-center policy validate policies/ + +# Test policy with test data +control-center policy test policies/mfa.cedar tests/data/mfa_test.json + +# Analyze policy impact +control-center policy impact policies/new_policy.cedar +``` + +### Compliance Checking + +```bash +# Check SOC2 compliance +control-center compliance soc2 + +# Check HIPAA compliance +control-center compliance hipaa + +# Generate compliance report +control-center compliance report --format html +``` + +## API Endpoints + +### Policy Evaluation + +- `POST /policies/evaluate` - Evaluate policy decision +- `GET /policies` - List all policies +- `POST /policies` - Create new policy +- `PUT /policies/{id}` - Update policy +- `DELETE /policies/{id}` - Delete policy + +### Policy Versions + +- `GET /policies/{id}/versions` - List policy versions +- `GET /policies/{id}/versions/{version}` - Get specific version +- `POST /policies/{id}/rollback/{version}` - Rollback to version + +### Compliance + +- `GET /compliance/soc2` - SOC2 compliance check +- `GET /compliance/hipaa` - HIPAA compliance check +- `GET /compliance/report` - Generate compliance report + +### Anomaly Detection + +- `GET /anomalies` - List detected anomalies +- `GET /anomalies/{id}` - Get anomaly details +- `POST /anomalies/detect` - Trigger anomaly detection + +## Testing + +### Run Unit Tests + +```bash +cargo test +``` + +### Run Integration Tests + +```bash +cargo test --test integration_tests +``` + +### Run Policy Tests + +```bash +cargo test --test policy_tests +``` + +### Run Compliance Tests + +```bash +cargo test --test compliance_tests +``` + +## Architecture + +### Core Components + +1. **Policy Engine** (`src/policies/engine.rs`) + - Cedar policy evaluation + - Context injection + - Caching and optimization + +2. **Storage Layer** (`src/storage/`) + - SurrealDB integration + - Policy versioning + - Metrics storage + +3. **Compliance Framework** (`src/compliance/`) + - SOC2 checker + - HIPAA validator + - Report generation + +4. **Anomaly Detection** (`src/anomaly/`) + - Statistical analysis + - Real-time monitoring + - Alert management + +5. **Authentication** (`src/auth.rs`) + - JWT token management + - Password hashing + - Session handling + +### Configuration-Driven Design + +The system follows PAP (Project Architecture Principles) with: + +- **No hardcoded values**: All behavior controlled via configuration +- **Dynamic loading**: Policies and rules loaded from configuration +- **Template-based**: Policy generation through templates +- **Environment-aware**: Different configs for dev/test/prod + +### Security Features + +- **Audit Logging**: All policy evaluations logged +- **Encryption**: Data encrypted at rest and in transit +- **Rate Limiting**: Protection against abuse +- **Input Validation**: Comprehensive validation of all inputs +- **Error Handling**: Secure error handling without information leakage + +## Production Deployment + +### Docker + +```bash +FROM rust:1.75 as builder +WORKDIR /app +COPY . . +RUN cargo build --release + +FROM debian:bookworm-slim +RUN apt-get update && apt-get install -y ca-certificates +COPY --from=builder /app/target/release/control-center /usr/local/bin/ +EXPOSE 8080 +CMD ["control-center", "server"] +``` + +### Kubernetes + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: control-center +spec: + replicas: 3 + selector: + matchLabels: + app: control-center + template: + metadata: + labels: + app: control-center + spec: + containers: + - name: control-center + image: control-center:latest + ports: + - containerPort: 8080 + env: + - name: DATABASE_URL + value: "surreal://surrealdb:8000" +``` + +### Environment Variables + +```bash +# Override config values with environment variables +export CONTROL_CENTER_SERVER_PORT=8080 +export CONTROL_CENTER_DATABASE_URL="surreal://prod-db:8000" +export CONTROL_CENTER_AUTH_JWT_SECRET="production-secret" +export CONTROL_CENTER_COMPLIANCE_SOC2_ENABLED=true +``` + +## Monitoring & Observability + +### Metrics + +- Policy evaluation latency +- Policy decision distribution +- Anomaly detection rates +- Compliance scores + +### Logging + +```bash +// Structured logging with tracing +tracing::info!( + policy_id = %policy.id, + principal = %context.principal.id, + decision = ?result.decision, + duration_ms = evaluation_time, + "Policy evaluation completed" +); +``` + +### Health Checks + +```bash +curl http://localhost:8080/health +``` + +## Contributing + +1. Follow the PAP principles documented in the codebase +2. Add tests for new features +3. Update documentation +4. Ensure compliance checks pass +5. Add appropriate logging and monitoring + +## License + +This project follows the licensing specified in the parent repository. + +## Support + +For questions and support, refer to the project documentation or create an issue in the repository. \ No newline at end of file diff --git a/crates/control-center/docs/security-considerations.md b/crates/control-center/docs/security-considerations.md index 4a417b7..9f2d468 100644 --- a/crates/control-center/docs/security-considerations.md +++ b/crates/control-center/docs/security-considerations.md @@ -1 +1,710 @@ -# Security Considerations for Control Center Enhancements\n\n## Overview\n\nThis document outlines the security architecture and considerations for the control-center enhancements,\nincluding KMS SSH key management, mode-based RBAC, and platform service monitoring.\n\n## 1. SSH Key Management Security\n\n### 1.1 Key Storage Security\n\n**Implementation**:\n\n- Private keys encrypted at rest using AES-256-GCM in KMS\n- Public keys stored in plaintext (as they are meant to be public)\n- Private key material never exposed in API responses\n- Key IDs used as references, not actual keys\n\n**Threat Mitigation**:\n\n- ✅ **Data at Rest**: All private keys encrypted with master encryption key\n- ✅ **Key Exposure**: Private keys only decrypted in memory when needed\n- ✅ **Key Leakage**: Zeroization of key material after use\n- ✅ **Unauthorized Access**: KMS access controlled by RBAC\n\n**Best Practices**:\n\n```{$detected_lang}\n// Good: Using key ID reference\nlet key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?;\n\n// Bad: Never do this - exposing private key in logs\ntracing::info!("Stored key: {}", private_key); // DON'T DO THIS\n```\n\n### 1.2 Key Rotation Security\n\n**Implementation**:\n\n- Configurable rotation intervals (default 90 days)\n- Grace period for old key usage (default 7 days)\n- Automatic rotation scheduling (if enabled)\n- Manual rotation support with immediate effect\n\n**Threat Mitigation**:\n\n- ✅ **Key Compromise**: Regular rotation limits exposure window\n- ✅ **Stale Keys**: Automated detection of keys due for rotation\n- ✅ **Rotation Failures**: Graceful degradation with error logging\n\n**Rotation Policy**:\n\n```{$detected_lang}\n[kms.ssh_keys]\nrotation_enabled = true\nrotation_interval_days = 90 # Enterprise: 30, Dev: 180\ngrace_period_days = 7 # Time to update deployed keys\nauto_rotate = false # Manual approval recommended\n```\n\n### 1.3 Audit Logging\n\n**Logged Events**:\n\n- SSH key creation (who, when, purpose)\n- SSH key retrieval (who accessed, when)\n- SSH key rotation (old key ID, new key ID)\n- SSH key deletion (who deleted, when)\n- Failed access attempts\n\n**Audit Entry Structure**:\n\n```{$detected_lang}\npub struct SshKeyAuditEntry {\n pub timestamp: DateTime,\n pub key_id: String,\n pub action: SshKeyAction,\n pub user: Option, // User who performed action\n pub ip_address: Option, // Source IP\n pub success: bool,\n pub error_message: Option,\n}\n```\n\n**Threat Mitigation**:\n\n- ✅ **Unauthorized Access**: Full audit trail for forensics\n- ✅ **Insider Threats**: User attribution for all actions\n- ✅ **Compliance**: GDPR/SOC2 audit log requirements met\n\n**Audit Log Retention**:\n\n- In-memory: Last 10,000 entries\n- Persistent: SurrealDB with 1-year retention\n- Compliance mode: 7-year retention (configurable)\n\n### 1.4 Key Fingerprinting\n\n**Implementation**:\n\n```{$detected_lang}\nfn calculate_fingerprint(public_key: &[u8]) -> Result {\n use sha2::{Sha256, Digest};\n let mut hasher = Sha256::new();\n hasher.update(public_key);\n let result = hasher.finalize();\n Ok(format!("SHA256:{}", base64::encode(&result[..16])))\n}\n```\n\n**Security Benefits**:\n\n- Verify key integrity\n- Detect key tampering\n- Match deployed keys to KMS records\n\n## 2. RBAC Security\n\n### 2.1 Execution Modes\n\n**Security Model by Mode**:\n\n| Mode | Security Level | Use Case | Audit Required |\n| ------ | --------------- | ---------- | ---------------- |\n| Solo | Low | Single developer | No |\n| MultiUser | Medium | Small teams | Optional |\n| CICD | Medium | Automation | Yes |\n| Enterprise | High | Production | Mandatory |\n\n**Mode-Specific Security**:\n\n#### Solo Mode\n\n```{$detected_lang}\n// Solo mode: All users are admin\n// Security: Trust-based, no RBAC checks\nif mode == ExecutionMode::Solo {\n return true; // Allow all operations\n}\n```\n\n**Risks**:\n\n- No access control\n- No audit trail\n- Single point of failure\n\n**Mitigations**:\n\n- Only for development environments\n- Network isolation required\n- Regular backups\n\n#### MultiUser Mode\n\n```{$detected_lang}\n// Multi-user: Role-based access control\nlet permissions = rbac_manager.get_user_permissions(&user).await;\nif !permissions.contains(&required_permission) {\n return Err(RbacError::PermissionDenied);\n}\n```\n\n**Security Features**:\n\n- Role-based permissions\n- Optional audit logging\n- Session management\n\n#### CICD Mode\n\n```{$detected_lang}\n// CICD: Service account focused\n// All actions logged for automation tracking\nif mode == ExecutionMode::CICD {\n audit_log.log_automation_action(service_account, action).await;\n}\n```\n\n**Security Features**:\n\n- Service account isolation\n- Mandatory audit logging\n- Token-based authentication\n- Short-lived credentials\n\n#### Enterprise Mode\n\n```{$detected_lang}\n// Enterprise: Full security\n// - Mandatory audit logging\n// - Stricter session timeouts\n// - Compliance reports\nif mode == ExecutionMode::Enterprise {\n audit_log.log_with_compliance(user, action, compliance_tags).await;\n}\n```\n\n**Security Features**:\n\n- Full RBAC enforcement\n- Comprehensive audit logging\n- Compliance reporting\n- Role assignment approval workflow\n\n### 2.2 Permission System\n\n**Permission Levels**:\n\n```{$detected_lang}\nRole::Admin => 100 // Full access\nRole::Operator => 80 // Deploy & manage\nRole::Developer => 60 // Read + dev deploy\nRole::ServiceAccount => 50 // Automation\nRole::Auditor => 40 // Read + audit\nRole::Viewer => 20 // Read-only\n```\n\n**Action Security Levels**:\n\n```{$detected_lang}\nAction::Delete => 100 // Destructive, admin only\nAction::Manage => 80 // Service management\nAction::Deploy => 80 // Deploy to production\nAction::Create => 60 // Create resources\nAction::Update => 60 // Modify resources\nAction::Execute => 50 // Execute operations\nAction::Audit => 40 // View audit logs\nAction::Read => 20 // View resources\n```\n\n**Permission Check**:\n\n```{$detected_lang}\npub fn can_perform(&self, required_level: u8) -> bool {\n self.permission_level() >= required_level\n}\n```\n\n**Security Guarantees**:\n\n- ✅ Least privilege by default (Viewer role)\n- ✅ Hierarchical permissions (higher roles include lower)\n- ✅ Explicit deny for unknown resources\n- ✅ No permission escalation without admin\n\n### 2.3 Session Security\n\n**Session Configuration**:\n\n```{$detected_lang}\n[security]\nsession_timeout_minutes = 60 # Solo/MultiUser\nsession_timeout_minutes = 30 # Enterprise\nmax_sessions_per_user = 5\nfailed_login_lockout_attempts = 5\nfailed_login_lockout_duration_minutes = 15\n```\n\n**Session Lifecycle**:\n\n1. User authenticates → JWT token issued\n2. Token includes: user_id, role, issued_at, expires_at\n3. Middleware validates token on each request\n4. Session tracked in Redis/RocksDB\n5. Session invalidated on logout or timeout\n\n**Security Features**:\n\n- JWT with RSA-2048 signatures\n- Refresh token rotation\n- Session fixation prevention\n- Concurrent session limits\n\n**Threat Mitigation**:\n\n- ✅ **Session Hijacking**: Short-lived tokens (1 hour)\n- ✅ **Token Replay**: One-time refresh tokens\n- ✅ **Brute Force**: Account lockout after 5 failures\n- ✅ **Session Fixation**: New session ID on login\n\n### 2.4 Middleware Security\n\n**RBAC Middleware Flow**:\n\n```{$detected_lang}\nRequest → Auth Middleware → RBAC Middleware → Handler\n ↓ ↓\n Extract User Check Permission\n from JWT Token (role + resource + action)\n ↓\n Allow / Deny\n```\n\n**Middleware Implementation**:\n\n```{$detected_lang}\npub async fn check_permission(\n State(state): State>,\n resource: Resource,\n action: Action,\n mut req: Request,\n next: Next,\n) -> Result {\n let user = req.extensions()\n .get::()\n .ok_or(RbacError::UserNotFound("No user in request".to_string()))?;\n\n if !state.rbac_manager.check_permission(&user, resource, action).await {\n return Err(RbacError::PermissionDenied);\n }\n\n Ok(next.run(req).await)\n}\n```\n\n**Security Guarantees**:\n\n- ✅ All API endpoints protected by default\n- ✅ Permission checked before handler execution\n- ✅ User context available in handlers\n- ✅ Failed checks logged for audit\n\n## 3. Platform Monitoring Security\n\n### 3.1 Service Access Security\n\n**Internal URLs Only**:\n\n```{$detected_lang}\n[platform]\norchestrator_url = "http://localhost:9090" # Not exposed externally\ncoredns_url = "http://localhost:9153"\ngitea_url = "http://localhost:3000"\noci_registry_url = "http://localhost:5000"\n```\n\n**Network Security**:\n\n- All services on localhost or internal network\n- No external exposure of monitoring endpoints\n- Firewall rules to prevent external access\n\n**Threat Mitigation**:\n\n- ✅ **External Scanning**: Services not reachable from internet\n- ✅ **DDoS**: Internal-only access limits attack surface\n- ✅ **Data Exfiltration**: Monitoring data not exposed externally\n\n### 3.2 Health Check Security\n\n**Timeout Protection**:\n\n```{$detected_lang}\nlet client = Client::builder()\n .timeout(std::time::Duration::from_secs(5)) // Prevent hanging\n .build()\n .unwrap();\n```\n\n**Error Handling**:\n\n```{$detected_lang}\n// Never expose internal errors to users\nErr(e) => {\n // Log detailed error internally\n tracing::error!("Health check failed for {}: {}", service, e);\n\n // Return generic error externally\n ServiceStatus {\n status: HealthStatus::Unhealthy,\n error_message: Some("Service unavailable".to_string()), // Generic\n ..\n }\n}\n```\n\n**Threat Mitigation**:\n\n- ✅ **Timeout Attacks**: 5-second timeout prevents resource exhaustion\n- ✅ **Information Disclosure**: Error messages sanitized\n- ✅ **Resource Exhaustion**: Parallel checks with concurrency limits\n\n### 3.3 Service Control Security\n\n**RBAC-Protected Service Control**:\n\n```{$detected_lang}\n// Only Operator or Admin can start/stop services\n#[axum::debug_handler]\npub async fn start_service(\n State(state): State,\n Extension(user): Extension,\n Path(service_type): Path,\n) -> Result {\n // Check permission\n if !rbac_manager.check_permission(\n &user,\n Resource::Service,\n Action::Manage,\n ).await {\n return Err(ApiError::PermissionDenied);\n }\n\n // Start service\n service_manager.start_service(&service_type).await?;\n\n // Audit log\n audit_log.log_service_action(user, service_type, "start").await;\n\n Ok(StatusCode::OK)\n}\n```\n\n**Security Guarantees**:\n\n- ✅ Only authorized users can control services\n- ✅ All service actions logged\n- ✅ Graceful degradation on service failure\n\n## 4. Threat Model\n\n### 4.1 High-Risk Threats\n\n#### Threat: SSH Private Key Exposure\n\n**Attack Vector**: Attacker gains access to KMS database\n\n**Mitigations**:\n\n- Private keys encrypted at rest with master key\n- Master key stored in hardware security module (HSM) or KMS\n- Key access audited and rate-limited\n- Zeroization of decrypted keys in memory\n\n**Detection**:\n\n- Audit log monitoring for unusual key access patterns\n- Alerting on bulk key retrievals\n\n#### Threat: Privilege Escalation\n\n**Attack Vector**: Lower-privileged user attempts to gain admin access\n\n**Mitigations**:\n\n- Role assignment requires Admin role\n- Mode switching requires Admin role\n- Middleware enforces permissions on every request\n- No client-side permission checks (server-side only)\n\n**Detection**:\n\n- Failed permission checks logged\n- Alerting on repeated permission denials\n\n#### Threat: Session Hijacking\n\n**Attack Vector**: Attacker steals JWT token\n\n**Mitigations**:\n\n- Short-lived access tokens (1 hour)\n- Refresh token rotation\n- Secure HTTP-only cookies (recommended)\n- IP address binding (optional)\n\n**Detection**:\n\n- Unusual login locations\n- Concurrent sessions from different IPs\n\n### 4.2 Medium-Risk Threats\n\n#### Threat: Service Impersonation\n\n**Attack Vector**: Malicious service pretends to be legitimate platform service\n\n**Mitigations**:\n\n- Service URLs configured in config file (not dynamic)\n- TLS certificate validation (if HTTPS)\n- Service authentication tokens\n\n**Detection**:\n\n- Health check failures\n- Metrics anomalies\n\n#### Threat: Audit Log Tampering\n\n**Attack Vector**: Attacker modifies audit logs to hide tracks\n\n**Mitigations**:\n\n- Audit logs write-only\n- Logs stored in tamper-evident database (SurrealDB)\n- Hash chain for log integrity\n- Offsite log backup\n\n**Detection**:\n\n- Hash chain verification\n- Log gap detection\n\n### 4.3 Low-Risk Threats\n\n#### Threat: Information Disclosure via Error Messages\n\n**Attack Vector**: Error messages leak internal information\n\n**Mitigations**:\n\n- Generic error messages for users\n- Detailed errors only in server logs\n- Error message sanitization\n\n**Detection**:\n\n- Code review for error handling\n- Automated scanning for sensitive data in responses\n\n## 5. Compliance Considerations\n\n### 5.1 GDPR Compliance\n\n**Personal Data Handling**:\n\n- User information: username, email, IP addresses\n- Retention: Audit logs kept for required period\n- Right to erasure: User deletion deletes all associated data\n\n**Implementation**:\n\n```{$detected_lang}\n// Delete user and all associated data\npub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> {\n // Delete user SSH keys\n for key in self.list_user_ssh_keys(user_id).await? {\n self.delete_ssh_key(&key.key_id).await?;\n }\n\n // Anonymize audit logs (retain for compliance, remove PII)\n self.anonymize_user_audit_logs(user_id).await?;\n\n // Delete user record\n self.delete_user_record(user_id).await?;\n\n Ok(())\n}\n```\n\n### 5.2 SOC 2 Compliance\n\n**Security Controls**:\n\n- ✅ Access control (RBAC)\n- ✅ Audit logging (all actions logged)\n- ✅ Encryption at rest (KMS)\n- ✅ Encryption in transit (HTTPS recommended)\n- ✅ Session management (timeout, MFA support)\n\n**Monitoring & Alerting**:\n\n- ✅ Service health monitoring\n- ✅ Failed login tracking\n- ✅ Permission denial alerting\n- ✅ Unusual activity detection\n\n### 5.3 PCI DSS (if applicable)\n\n**Requirements**:\n\n- ✅ Encrypt cardholder data (use KMS for keys)\n- ✅ Maintain access control (RBAC)\n- ✅ Track and monitor access (audit logs)\n- ✅ Regularly test security (integration tests)\n\n## 6. Security Best Practices\n\n### 6.1 Development\n\n**Code Review Checklist**:\n\n- [ ] All API endpoints have RBAC middleware\n- [ ] No hardcoded secrets or keys\n- [ ] Error messages don't leak sensitive info\n- [ ] Audit logging for sensitive operations\n- [ ] Input validation on all user inputs\n- [ ] SQL injection prevention (use parameterized queries)\n- [ ] XSS prevention (escape user inputs)\n\n**Testing**:\n\n- Unit tests for permission checks\n- Integration tests for RBAC enforcement\n- Penetration testing for production deployments\n\n### 6.2 Deployment\n\n**Production Checklist**:\n\n- [ ] Change default admin password\n- [ ] Enable HTTPS with valid certificate\n- [ ] Configure firewall rules (internal services only)\n- [ ] Set appropriate execution mode (Enterprise for production)\n- [ ] Enable audit logging\n- [ ] Configure session timeout (30 minutes for Enterprise)\n- [ ] Enable rate limiting\n- [ ] Set up log monitoring and alerting\n- [ ] Regular security updates\n- [ ] Backup encryption keys\n\n### 6.3 Operations\n\n**Incident Response**:\n\n1. **Detection**: Monitor audit logs for anomalies\n2. **Containment**: Revoke compromised credentials\n3. **Eradication**: Rotate affected SSH keys\n4. **Recovery**: Restore from backup if needed\n5. **Lessons Learned**: Update security controls\n\n**Key Rotation Schedule**:\n\n- SSH keys: Every 90 days (Enterprise: 30 days)\n- JWT signing keys: Every 180 days\n- Master encryption key: Every 365 days\n- Service account tokens: Every 30 days\n\n## 7. Security Metrics\n\n### 7.1 Monitoring Metrics\n\n**Authentication**:\n\n- Failed login attempts per user\n- Concurrent sessions per user\n- Session duration (average, p95, p99)\n\n**Authorization**:\n\n- Permission denials per user\n- Permission denials per resource\n- Role assignments per day\n\n**Audit**:\n\n- SSH key accesses per day\n- SSH key rotations per month\n- Audit log retention compliance\n\n**Services**:\n\n- Service health check success rate\n- Service response times (p50, p95, p99)\n- Service dependency failures\n\n### 7.2 Alerting Thresholds\n\n**Critical Alerts**:\n\n- Service health: >3 failures in 5 minutes\n- Failed logins: >10 attempts in 1 minute\n- Permission denials: >50 in 1 minute\n- SSH key bulk retrieval: >10 keys in 1 minute\n\n**Warning Alerts**:\n\n- Service degraded: response time >1 second\n- Session timeout rate: >10% of sessions\n- Audit log storage: >80% capacity\n\n## 8. Security Roadmap\n\n### Phase 1 (Completed)\n\n- ✅ SSH key storage with encryption\n- ✅ Mode-based RBAC\n- ✅ Audit logging\n- ✅ Platform monitoring\n\n### Phase 2 (In Progress)\n\n- 📋 API handlers with RBAC enforcement\n- 📋 Integration tests for security\n- 📋 Documentation\n\n### Phase 3 (Future)\n\n- Multi-factor authentication (MFA)\n- Hardware security module (HSM) integration\n- Advanced threat detection (ML-based)\n- Automated security scanning\n- Compliance report generation\n- Security information and event management (SIEM) integration\n\n## References\n\n- **OWASP Top 10**: \n- **NIST Cybersecurity Framework**: \n- **CIS Controls**: \n- **GDPR**: \n- **SOC 2**: \n\n---\n\n**Last Updated**: 2025-10-06\n**Review Cycle**: Quarterly\n**Next Review**: 2026-01-06 +# Security Considerations for Control Center Enhancements + +## Overview + +This document outlines the security architecture and considerations for the control-center enhancements, +including KMS SSH key management, mode-based RBAC, and platform service monitoring. + +## 1. SSH Key Management Security + +### 1.1 Key Storage Security + +**Implementation**: + +- Private keys encrypted at rest using AES-256-GCM in KMS +- Public keys stored in plaintext (as they are meant to be public) +- Private key material never exposed in API responses +- Key IDs used as references, not actual keys + +**Threat Mitigation**: + +- ✅ **Data at Rest**: All private keys encrypted with master encryption key +- ✅ **Key Exposure**: Private keys only decrypted in memory when needed +- ✅ **Key Leakage**: Zeroization of key material after use +- ✅ **Unauthorized Access**: KMS access controlled by RBAC + +**Best Practices**: + +```bash +// Good: Using key ID reference +let key_id = ssh_key_manager.store_ssh_key(name, private, public, purpose, tags).await?; + +// Bad: Never do this - exposing private key in logs +tracing::info!("Stored key: {}", private_key); // DON'T DO THIS +``` + +### 1.2 Key Rotation Security + +**Implementation**: + +- Configurable rotation intervals (default 90 days) +- Grace period for old key usage (default 7 days) +- Automatic rotation scheduling (if enabled) +- Manual rotation support with immediate effect + +**Threat Mitigation**: + +- ✅ **Key Compromise**: Regular rotation limits exposure window +- ✅ **Stale Keys**: Automated detection of keys due for rotation +- ✅ **Rotation Failures**: Graceful degradation with error logging + +**Rotation Policy**: + +```toml +[kms.ssh_keys] +rotation_enabled = true +rotation_interval_days = 90 # Enterprise: 30, Dev: 180 +grace_period_days = 7 # Time to update deployed keys +auto_rotate = false # Manual approval recommended +``` + +### 1.3 Audit Logging + +**Logged Events**: + +- SSH key creation (who, when, purpose) +- SSH key retrieval (who accessed, when) +- SSH key rotation (old key ID, new key ID) +- SSH key deletion (who deleted, when) +- Failed access attempts + +**Audit Entry Structure**: + +```rust +pub struct SshKeyAuditEntry { + pub timestamp: DateTime, + pub key_id: String, + pub action: SshKeyAction, + pub user: Option, // User who performed action + pub ip_address: Option, // Source IP + pub success: bool, + pub error_message: Option, +} +``` + +**Threat Mitigation**: + +- ✅ **Unauthorized Access**: Full audit trail for forensics +- ✅ **Insider Threats**: User attribution for all actions +- ✅ **Compliance**: GDPR/SOC2 audit log requirements met + +**Audit Log Retention**: + +- In-memory: Last 10,000 entries +- Persistent: SurrealDB with 1-year retention +- Compliance mode: 7-year retention (configurable) + +### 1.4 Key Fingerprinting + +**Implementation**: + +```rust +fn calculate_fingerprint(public_key: &[u8]) -> Result { + use sha2::{Sha256, Digest}; + let mut hasher = Sha256::new(); + hasher.update(public_key); + let result = hasher.finalize(); + Ok(format!("SHA256:{}", base64::encode(&result[..16]))) +} +``` + +**Security Benefits**: + +- Verify key integrity +- Detect key tampering +- Match deployed keys to KMS records + +## 2. RBAC Security + +### 2.1 Execution Modes + +**Security Model by Mode**: + +| Mode | Security Level | Use Case | Audit Required | +| ------ | --------------- | ---------- | ---------------- | +| Solo | Low | Single developer | No | +| MultiUser | Medium | Small teams | Optional | +| CICD | Medium | Automation | Yes | +| Enterprise | High | Production | Mandatory | + +**Mode-Specific Security**: + +#### Solo Mode + +```bash +// Solo mode: All users are admin +// Security: Trust-based, no RBAC checks +if mode == ExecutionMode::Solo { + return true; // Allow all operations +} +``` + +**Risks**: + +- No access control +- No audit trail +- Single point of failure + +**Mitigations**: + +- Only for development environments +- Network isolation required +- Regular backups + +#### MultiUser Mode + +```bash +// Multi-user: Role-based access control +let permissions = rbac_manager.get_user_permissions(&user).await; +if !permissions.contains(&required_permission) { + return Err(RbacError::PermissionDenied); +} +``` + +**Security Features**: + +- Role-based permissions +- Optional audit logging +- Session management + +#### CICD Mode + +```bash +// CICD: Service account focused +// All actions logged for automation tracking +if mode == ExecutionMode::CICD { + audit_log.log_automation_action(service_account, action).await; +} +``` + +**Security Features**: + +- Service account isolation +- Mandatory audit logging +- Token-based authentication +- Short-lived credentials + +#### Enterprise Mode + +```bash +// Enterprise: Full security +// - Mandatory audit logging +// - Stricter session timeouts +// - Compliance reports +if mode == ExecutionMode::Enterprise { + audit_log.log_with_compliance(user, action, compliance_tags).await; +} +``` + +**Security Features**: + +- Full RBAC enforcement +- Comprehensive audit logging +- Compliance reporting +- Role assignment approval workflow + +### 2.2 Permission System + +**Permission Levels**: + +```bash +Role::Admin => 100 // Full access +Role::Operator => 80 // Deploy & manage +Role::Developer => 60 // Read + dev deploy +Role::ServiceAccount => 50 // Automation +Role::Auditor => 40 // Read + audit +Role::Viewer => 20 // Read-only +``` + +**Action Security Levels**: + +```bash +Action::Delete => 100 // Destructive, admin only +Action::Manage => 80 // Service management +Action::Deploy => 80 // Deploy to production +Action::Create => 60 // Create resources +Action::Update => 60 // Modify resources +Action::Execute => 50 // Execute operations +Action::Audit => 40 // View audit logs +Action::Read => 20 // View resources +``` + +**Permission Check**: + +```rust +pub fn can_perform(&self, required_level: u8) -> bool { + self.permission_level() >= required_level +} +``` + +**Security Guarantees**: + +- ✅ Least privilege by default (Viewer role) +- ✅ Hierarchical permissions (higher roles include lower) +- ✅ Explicit deny for unknown resources +- ✅ No permission escalation without admin + +### 2.3 Session Security + +**Session Configuration**: + +```toml +[security] +session_timeout_minutes = 60 # Solo/MultiUser +session_timeout_minutes = 30 # Enterprise +max_sessions_per_user = 5 +failed_login_lockout_attempts = 5 +failed_login_lockout_duration_minutes = 15 +``` + +**Session Lifecycle**: + +1. User authenticates → JWT token issued +2. Token includes: user_id, role, issued_at, expires_at +3. Middleware validates token on each request +4. Session tracked in Redis/RocksDB +5. Session invalidated on logout or timeout + +**Security Features**: + +- JWT with RSA-2048 signatures +- Refresh token rotation +- Session fixation prevention +- Concurrent session limits + +**Threat Mitigation**: + +- ✅ **Session Hijacking**: Short-lived tokens (1 hour) +- ✅ **Token Replay**: One-time refresh tokens +- ✅ **Brute Force**: Account lockout after 5 failures +- ✅ **Session Fixation**: New session ID on login + +### 2.4 Middleware Security + +**RBAC Middleware Flow**: + +```bash +Request → Auth Middleware → RBAC Middleware → Handler + ↓ ↓ + Extract User Check Permission + from JWT Token (role + resource + action) + ↓ + Allow / Deny +``` + +**Middleware Implementation**: + +```rust +pub async fn check_permission( + State(state): State>, + resource: Resource, + action: Action, + mut req: Request, + next: Next, +) -> Result { + let user = req.extensions() + .get::() + .ok_or(RbacError::UserNotFound("No user in request".to_string()))?; + + if !state.rbac_manager.check_permission(&user, resource, action).await { + return Err(RbacError::PermissionDenied); + } + + Ok(next.run(req).await) +} +``` + +**Security Guarantees**: + +- ✅ All API endpoints protected by default +- ✅ Permission checked before handler execution +- ✅ User context available in handlers +- ✅ Failed checks logged for audit + +## 3. Platform Monitoring Security + +### 3.1 Service Access Security + +**Internal URLs Only**: + +```toml +[platform] +orchestrator_url = "http://localhost:9090" # Not exposed externally +coredns_url = "http://localhost:9153" +gitea_url = "http://localhost:3000" +oci_registry_url = "http://localhost:5000" +``` + +**Network Security**: + +- All services on localhost or internal network +- No external exposure of monitoring endpoints +- Firewall rules to prevent external access + +**Threat Mitigation**: + +- ✅ **External Scanning**: Services not reachable from internet +- ✅ **DDoS**: Internal-only access limits attack surface +- ✅ **Data Exfiltration**: Monitoring data not exposed externally + +### 3.2 Health Check Security + +**Timeout Protection**: + +```javascript +let client = Client::builder() + .timeout(std::time::Duration::from_secs(5)) // Prevent hanging + .build() + .unwrap(); +``` + +**Error Handling**: + +```bash +// Never expose internal errors to users +Err(e) => { + // Log detailed error internally + tracing::error!("Health check failed for {}: {}", service, e); + + // Return generic error externally + ServiceStatus { + status: HealthStatus::Unhealthy, + error_message: Some("Service unavailable".to_string()), // Generic + .. + } +} +``` + +**Threat Mitigation**: + +- ✅ **Timeout Attacks**: 5-second timeout prevents resource exhaustion +- ✅ **Information Disclosure**: Error messages sanitized +- ✅ **Resource Exhaustion**: Parallel checks with concurrency limits + +### 3.3 Service Control Security + +**RBAC-Protected Service Control**: + +```bash +// Only Operator or Admin can start/stop services +#[axum::debug_handler] +pub async fn start_service( + State(state): State, + Extension(user): Extension, + Path(service_type): Path, +) -> Result { + // Check permission + if !rbac_manager.check_permission( + &user, + Resource::Service, + Action::Manage, + ).await { + return Err(ApiError::PermissionDenied); + } + + // Start service + service_manager.start_service(&service_type).await?; + + // Audit log + audit_log.log_service_action(user, service_type, "start").await; + + Ok(StatusCode::OK) +} +``` + +**Security Guarantees**: + +- ✅ Only authorized users can control services +- ✅ All service actions logged +- ✅ Graceful degradation on service failure + +## 4. Threat Model + +### 4.1 High-Risk Threats + +#### Threat: SSH Private Key Exposure + +**Attack Vector**: Attacker gains access to KMS database + +**Mitigations**: + +- Private keys encrypted at rest with master key +- Master key stored in hardware security module (HSM) or KMS +- Key access audited and rate-limited +- Zeroization of decrypted keys in memory + +**Detection**: + +- Audit log monitoring for unusual key access patterns +- Alerting on bulk key retrievals + +#### Threat: Privilege Escalation + +**Attack Vector**: Lower-privileged user attempts to gain admin access + +**Mitigations**: + +- Role assignment requires Admin role +- Mode switching requires Admin role +- Middleware enforces permissions on every request +- No client-side permission checks (server-side only) + +**Detection**: + +- Failed permission checks logged +- Alerting on repeated permission denials + +#### Threat: Session Hijacking + +**Attack Vector**: Attacker steals JWT token + +**Mitigations**: + +- Short-lived access tokens (1 hour) +- Refresh token rotation +- Secure HTTP-only cookies (recommended) +- IP address binding (optional) + +**Detection**: + +- Unusual login locations +- Concurrent sessions from different IPs + +### 4.2 Medium-Risk Threats + +#### Threat: Service Impersonation + +**Attack Vector**: Malicious service pretends to be legitimate platform service + +**Mitigations**: + +- Service URLs configured in config file (not dynamic) +- TLS certificate validation (if HTTPS) +- Service authentication tokens + +**Detection**: + +- Health check failures +- Metrics anomalies + +#### Threat: Audit Log Tampering + +**Attack Vector**: Attacker modifies audit logs to hide tracks + +**Mitigations**: + +- Audit logs write-only +- Logs stored in tamper-evident database (SurrealDB) +- Hash chain for log integrity +- Offsite log backup + +**Detection**: + +- Hash chain verification +- Log gap detection + +### 4.3 Low-Risk Threats + +#### Threat: Information Disclosure via Error Messages + +**Attack Vector**: Error messages leak internal information + +**Mitigations**: + +- Generic error messages for users +- Detailed errors only in server logs +- Error message sanitization + +**Detection**: + +- Code review for error handling +- Automated scanning for sensitive data in responses + +## 5. Compliance Considerations + +### 5.1 GDPR Compliance + +**Personal Data Handling**: + +- User information: username, email, IP addresses +- Retention: Audit logs kept for required period +- Right to erasure: User deletion deletes all associated data + +**Implementation**: + +```bash +// Delete user and all associated data +pub async fn delete_user(&self, user_id: &str) -> Result<(), RbacError> { + // Delete user SSH keys + for key in self.list_user_ssh_keys(user_id).await? { + self.delete_ssh_key(&key.key_id).await?; + } + + // Anonymize audit logs (retain for compliance, remove PII) + self.anonymize_user_audit_logs(user_id).await?; + + // Delete user record + self.delete_user_record(user_id).await?; + + Ok(()) +} +``` + +### 5.2 SOC 2 Compliance + +**Security Controls**: + +- ✅ Access control (RBAC) +- ✅ Audit logging (all actions logged) +- ✅ Encryption at rest (KMS) +- ✅ Encryption in transit (HTTPS recommended) +- ✅ Session management (timeout, MFA support) + +**Monitoring & Alerting**: + +- ✅ Service health monitoring +- ✅ Failed login tracking +- ✅ Permission denial alerting +- ✅ Unusual activity detection + +### 5.3 PCI DSS (if applicable) + +**Requirements**: + +- ✅ Encrypt cardholder data (use KMS for keys) +- ✅ Maintain access control (RBAC) +- ✅ Track and monitor access (audit logs) +- ✅ Regularly test security (integration tests) + +## 6. Security Best Practices + +### 6.1 Development + +**Code Review Checklist**: + +- [ ] All API endpoints have RBAC middleware +- [ ] No hardcoded secrets or keys +- [ ] Error messages don't leak sensitive info +- [ ] Audit logging for sensitive operations +- [ ] Input validation on all user inputs +- [ ] SQL injection prevention (use parameterized queries) +- [ ] XSS prevention (escape user inputs) + +**Testing**: + +- Unit tests for permission checks +- Integration tests for RBAC enforcement +- Penetration testing for production deployments + +### 6.2 Deployment + +**Production Checklist**: + +- [ ] Change default admin password +- [ ] Enable HTTPS with valid certificate +- [ ] Configure firewall rules (internal services only) +- [ ] Set appropriate execution mode (Enterprise for production) +- [ ] Enable audit logging +- [ ] Configure session timeout (30 minutes for Enterprise) +- [ ] Enable rate limiting +- [ ] Set up log monitoring and alerting +- [ ] Regular security updates +- [ ] Backup encryption keys + +### 6.3 Operations + +**Incident Response**: + +1. **Detection**: Monitor audit logs for anomalies +2. **Containment**: Revoke compromised credentials +3. **Eradication**: Rotate affected SSH keys +4. **Recovery**: Restore from backup if needed +5. **Lessons Learned**: Update security controls + +**Key Rotation Schedule**: + +- SSH keys: Every 90 days (Enterprise: 30 days) +- JWT signing keys: Every 180 days +- Master encryption key: Every 365 days +- Service account tokens: Every 30 days + +## 7. Security Metrics + +### 7.1 Monitoring Metrics + +**Authentication**: + +- Failed login attempts per user +- Concurrent sessions per user +- Session duration (average, p95, p99) + +**Authorization**: + +- Permission denials per user +- Permission denials per resource +- Role assignments per day + +**Audit**: + +- SSH key accesses per day +- SSH key rotations per month +- Audit log retention compliance + +**Services**: + +- Service health check success rate +- Service response times (p50, p95, p99) +- Service dependency failures + +### 7.2 Alerting Thresholds + +**Critical Alerts**: + +- Service health: >3 failures in 5 minutes +- Failed logins: >10 attempts in 1 minute +- Permission denials: >50 in 1 minute +- SSH key bulk retrieval: >10 keys in 1 minute + +**Warning Alerts**: + +- Service degraded: response time >1 second +- Session timeout rate: >10% of sessions +- Audit log storage: >80% capacity + +## 8. Security Roadmap + +### Phase 1 (Completed) + +- ✅ SSH key storage with encryption +- ✅ Mode-based RBAC +- ✅ Audit logging +- ✅ Platform monitoring + +### Phase 2 (In Progress) + +- 📋 API handlers with RBAC enforcement +- 📋 Integration tests for security +- 📋 Documentation + +### Phase 3 (Future) + +- Multi-factor authentication (MFA) +- Hardware security module (HSM) integration +- Advanced threat detection (ML-based) +- Automated security scanning +- Compliance report generation +- Security information and event management (SIEM) integration + +## References + +- **OWASP Top 10**: +- **NIST Cybersecurity Framework**: +- **CIS Controls**: +- **GDPR**: +- **SOC 2**: + +--- + +**Last Updated**: 2025-10-06 +**Review Cycle**: Quarterly +**Next Review**: 2026-01-06 \ No newline at end of file diff --git a/crates/control-center/src/kms/README.md b/crates/control-center/src/kms/README.md index d8d234e..4e6aba2 100644 --- a/crates/control-center/src/kms/README.md +++ b/crates/control-center/src/kms/README.md @@ -1 +1,453 @@ -# Hybrid Key Management System (KMS)\n\nA comprehensive hybrid KMS system built for the control center, supporting local/remote/hybrid modes\nwith intelligent caching, failover, and advanced security features.\n\n## Architecture Overview\n\n### Core Components\n\n1. **KMS Backends**\n - **Local Backend**: SQLite with AES-256-GCM encryption\n - **Remote Backend**: Cosmian KMS client integration\n - **Hybrid Backend**: Intelligent combination of local and remote\n\n2. **Caching System**\n - **Memory Cache**: In-memory LRU cache with TTL\n - **Redis Cache**: Distributed caching with automatic failover\n - **Local File Cache**: Persistent file-based cache for offline scenarios\n\n3. **Security Features**\n - **Encryption**: AES-256-GCM for local storage\n - **Key Derivation**: HKDF-SHA256 for master key derivation\n - **Authentication**: Multiple auth methods (certificate, token, basic, OAuth)\n - **Audit Logging**: Comprehensive audit trail for all operations\n\n4. **Advanced Features**\n - **Credential Management**: Automatic injection for cloud providers\n - **Key Rotation**: Configurable automatic key rotation\n - **HSM Integration**: Hardware Security Module support\n - **Zero-Knowledge Proofs**: For sensitive operations\n - **Migration Tools**: Backend-to-backend key migration\n\n## Configuration\n\n### TOML Configuration Example\n\n```{$detected_lang}\n[kms]\n# Operation mode: local, remote, or hybrid\nmode = "hybrid"\n\n# Local SQLite backend configuration\n[kms.local]\ndatabase_path = "./data/kms.db"\n\n[kms.local.master_key]\nderivation_method = "hkdf"\nsource = "generated"\niterations = 100000\n\n[kms.local.encryption]\ndefault_algorithm = "AES-256-GCM"\nkey_size_bits = 256\nauthenticated = true\n\n[kms.local.backup]\nenabled = true\nbackup_dir = "./backups/kms"\ninterval_hours = 24\nmax_backups = 7\ncompress = true\nencrypt = true\n\n# Remote Cosmian KMS configuration\n[kms.remote]\nserver_url = "https://kms.example.com:9998"\nauth_method = "certificate"\nclient_cert_path = "./certs/client.crt"\nclient_key_path = "./certs/client.key"\nca_cert_path = "./certs/ca.crt"\ntimeout_seconds = 30\nverify_ssl = true\n\n[kms.remote.retry]\nmax_attempts = 3\ninitial_delay_ms = 1000\nmax_delay_ms = 30000\nbackoff_multiplier = 2.0\n\n# Cache configuration\n[kms.cache]\nenabled = true\nbackend = "memory"\ndefault_ttl_seconds = 3600\nmax_size_bytes = 104857600 # 100MB\nlocal_dir = "./cache/kms"\n\n# Credential management\n[kms.credentials]\nenabled = true\n\n[kms.credentials.storage]\nstorage_type = "sqlite"\nencryption_key_id = "credential_encryption_key"\ndatabase_path = "./data/credentials.db"\n\n[kms.credentials.providers.aws]\nname = "AWS"\nprovider_type = "aws"\nrefresh_interval_seconds = 3600\nexpiry_warning_seconds = 300\nregions = ["us-east-1", "eu-west-1"]\n\n[kms.credentials.providers.upcloud]\nname = "UpCloud"\nprovider_type = "upcloud"\nrefresh_interval_seconds = 3600\nexpiry_warning_seconds = 300\nregions = ["fi-hel1", "us-nyc1"]\n\n# Key rotation configuration\n[kms.rotation]\nenabled = true\ninterval_seconds = 2592000 # 30 days\nmax_age_seconds = 7776000 # 90 days\nnotice_seconds = 604800 # 7 days\nschedule = "0 2 * * 0" # Every Sunday at 2 AM\n\n# Audit configuration\n[kms.audit]\nenabled = true\nbackend = "file"\nretention_days = 90\nlog_level = "info"\ninclude_data = false\nmax_file_size_mb = 100\nformat = "json"\n\n# HSM configuration\n[kms.hsm]\nenabled = false\nhsm_type = "pkcs11"\npkcs11_library = "/usr/lib/libpkcs11.so"\nslot_id = 0\n\n# Zero-knowledge proof configuration\n[kms.zkp]\nenabled = false\nproof_system = "groth16"\nsetup_params_path = "./zkp/setup.params"\n\n# Security policy\n[kms.security]\nrequire_strong_auth = true\nmin_key_length_bits = 256\nmax_key_age_days = 90\nenable_pfs = true\nallowed_algorithms = ["AES-256-GCM", "ChaCha20Poly1305", "RSA-4096", "ECDSA-P384"]\nblocked_algorithms = ["DES", "3DES", "RC4", "MD5"]\npolicy_enforcement = "strict"\n```\n\n## Usage Examples\n\n### Basic KMS Operations\n\n```{$detected_lang}\nuse control_center::kms::{KmsManager, KmsConfig, KeyData, KeyType, KeyAlgorithm, KeyUsage};\n\n// Initialize KMS manager\nlet config = KmsConfig::load_from_file("kms.toml").await?;\nlet mut kms = KmsManager::new(&config).await?;\nkms.initialize().await?;\n\n// Create a new encryption key\nlet key_data = KeyData {\n key_id: "my-encryption-key".to_string(),\n key_type: KeyType::Symmetric,\n algorithm: KeyAlgorithm::Aes256Gcm,\n usage: KeyUsage {\n encrypt: true,\n decrypt: true,\n ..Default::default()\n },\n key_size: 256,\n key_material: SecretBytes::new(generate_random_key(32)),\n metadata: KeyMetadata {\n name: Some("Application Encryption Key".to_string()),\n description: Some("Key for encrypting application data".to_string()),\n owner: Some("app-service".to_string()),\n environment: Some("production".to_string()),\n ..Default::default()\n },\n created_at: Utc::now(),\n last_accessed: None,\n expires_at: Some(Utc::now() + chrono::Duration::days(90)),\n status: KeyStatus::Active,\n tags: HashMap::from([\n ("purpose".to_string(), "encryption".to_string()),\n ("service".to_string(), "app".to_string()),\n ]),\n};\n\n// Store the key\nlet stored_key_id = kms.store_key(key_data).await?;\nprintln!("Key stored with ID: {}", stored_key_id);\n\n// Encrypt data\nlet plaintext = b"sensitive data to encrypt";\nlet context = HashMap::from([\n ("service".to_string(), "app".to_string()),\n ("version".to_string(), "1.0".to_string()),\n]);\nlet ciphertext = kms.encrypt(&stored_key_id, plaintext, Some(context.clone())).await?;\n\n// Decrypt data\nlet decrypted = kms.decrypt(&stored_key_id, &ciphertext, Some(context)).await?;\nassert_eq!(plaintext, decrypted.as_slice());\n\n// Get key information\nif let Some(key_info) = kms.get_key(&stored_key_id).await? {\n println!("Key algorithm: {:?}", key_info.algorithm);\n println!("Key status: {:?}", key_info.status);\n println!("Created: {}", key_info.created_at);\n}\n```\n\n### Provider Credential Management\n\n```{$detected_lang}\nuse control_center::kms::{ProviderCredentials, CredentialType};\n\n// Store AWS credentials\nlet aws_creds = ProviderCredentials {\n provider: "aws".to_string(),\n credential_type: CredentialType::AccessKey,\n access_key: "AKIAIOSFODNN7EXAMPLE".to_string(),\n secret_key: SecretBytes::new(b"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY".to_vec()),\n session_token: None,\n region: Some("us-east-1".to_string()),\n config: HashMap::new(),\n expires_at: Some(Utc::now() + chrono::Duration::hours(12)),\n created_at: Utc::now(),\n};\n\nkms.store_provider_credentials("aws", aws_creds).await?;\n\n// Retrieve credentials for automatic injection\nif let Some(creds) = kms.get_provider_credentials("aws").await? {\n println!("AWS Access Key: {}", creds.access_key);\n // Credentials are automatically injected into environment variables\n // or configuration files based on the injection configuration\n}\n```\n\n### Health Monitoring\n\n```{$detected_lang}\n// Check system health\nlet health = kms.health_check().await?;\nprintln!("KMS Health: {}", health.overall);\nprintln!("Backend Status: {}", health.backend.healthy);\nprintln!("Rotation Status: {}", health.rotation.healthy);\nprintln!("Credentials Status: {}", health.credentials.healthy);\n\n// Get cache statistics\nlet cache_stats = kms.cache.stats().await;\nprintln!("Cache hit rate: {:.2}%", cache_stats.hit_rate() * 100.0);\nprintln!("Cache entries: {}", cache_stats.entry_count);\n```\n\n## Integration with Existing System\n\n### Environment Variable Integration\n\nThe KMS system integrates with the existing environment-based configuration:\n\n```{$detected_lang}\n# Set KMS configuration environment\nexport PROVISIONING_KMS_MODE=hybrid\nexport PROVISIONING_KMS_LOCAL_DATABASE_PATH=/var/lib/provisioning/kms.db\nexport PROVISIONING_KMS_REMOTE_SERVER_URL=https://kms.example.com:9998\nexport PROVISIONING_KMS_CACHE_ENABLED=true\n```\n\n### TOML Configuration Integration\n\nAdd KMS configuration to your existing `config.defaults.toml`:\n\n```{$detected_lang}\n# Extend existing configuration with KMS section\n[kms]\nmode = "local" # Start with local mode\nlocal.database_path = "{{paths.base}}/data/kms.db"\ncache.enabled = true\ncache.local_dir = "{{paths.base}}/cache/kms"\naudit.enabled = true\n```\n\n### Nushell Integration\n\n```{$detected_lang}\n# Load KMS configuration\ndef kms_config [] {\n let config_path = $"($env.PROVISIONING_BASE)/config.toml"\n open $config_path | get kms\n}\n\n# Check KMS health\ndef kms_health [] {\n http get http://localhost:8080/kms/health | from json\n}\n\n# List available keys\ndef kms_keys [] {\n http get http://localhost:8080/kms/keys | from json\n}\n```\n\n## Security Considerations\n\n### Key Storage Security\n\n1. **Local Storage**: Keys are encrypted using AES-256-GCM with keys derived from a master key using HKDF\n2. **Remote Storage**: Relies on Cosmian KMS security guarantees\n3. **Cache**: Cached keys have TTL and are encrypted in transit\n4. **Memory**: Key material is zeroed on drop using the `zeroize` crate\n\n### Access Control\n\n1. **Authentication**: Multiple authentication methods supported\n2. **Authorization**: Key usage permissions enforced\n3. **Audit**: All operations logged with detailed context\n4. **Network**: TLS encryption for all remote communications\n\n### Compliance Features\n\n1. **Audit Trail**: Comprehensive logging of all KMS operations\n2. **Key Rotation**: Automatic rotation with configurable policies\n3. **Retention**: Configurable key and audit log retention\n4. **Export Controls**: Keys can be marked as non-exportable\n\n## Deployment Guide\n\n### Production Deployment\n\n1. **Configuration**:\n\n ```toml\n [kms]\n mode = "hybrid"\n\n [kms.security]\n policy_enforcement = "strict"\n require_strong_auth = true\n\n [kms.audit]\n enabled = true\n backend = "database"\n retention_days = 365\n ```\n\n2. **Infrastructure**:\n - Dedicated KMS database server\n - Redis cluster for caching\n - HSM for high-security keys\n - Backup and disaster recovery\n\n3. **Monitoring**:\n - Health check endpoints\n - Metrics collection\n - Alert on key expiration\n - Audit log monitoring\n\n### Development Setup\n\n1. **Local Development**:\n\n ```toml\n [kms]\n mode = "local"\n\n [kms.security]\n policy_enforcement = "permissive"\n\n [kms.audit]\n enabled = true\n backend = "stdout"\n ```\n\n2. **Testing**:\n\n ```bash\n # Run with test configuration\n PROVISIONING_ENV=test ./control-center\n ```\n\n## Performance Optimization\n\n### Caching Strategy\n\n1. **Multi-level Caching**: Memory → Redis → Local File\n2. **TTL Management**: Configurable per key type\n3. **Cache Warming**: Preload frequently used keys\n4. **Eviction Policies**: LRU with size-based eviction\n\n### Connection Pooling\n\n1. **Database Connections**: Configurable pool size\n2. **HTTP Connections**: Keep-alive and connection reuse\n3. **Batch Operations**: Bulk key operations where supported\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Connection Failures**: Check network connectivity and certificates\n2. **Permission Errors**: Verify file system permissions for local storage\n3. **Cache Misses**: Monitor cache hit rates and adjust TTL\n4. **Key Rotation**: Check rotation scheduler logs\n\n### Debug Mode\n\n```{$detected_lang}\n# Enable debug logging\nexport PROVISIONING_DEBUG=true\nexport PROVISIONING_LOG_LEVEL=debug\n\n# Run with verbose output\n./control-center --debug\n```\n\n### Health Checks\n\n```{$detected_lang}\n# Check KMS health\ncurl http://localhost:8080/kms/health\n\n# Check individual components\ncurl http://localhost:8080/kms/health/backend\ncurl http://localhost:8080/kms/health/cache\ncurl http://localhost:8080/kms/health/rotation\n```\n\n## Future Enhancements\n\n### Planned Features\n\n1. **Multi-Region Support**: Cross-region key replication\n2. **Key Versioning**: Multiple versions per key with rollback\n3. **Policy Engine**: Fine-grained access control policies\n4. **Metrics Dashboard**: Web UI for monitoring and management\n5. **Integration APIs**: REST and gRPC APIs for external systems\n\n### Experimental Features\n\n1. **Zero-Knowledge Proofs**: For privacy-preserving operations\n2. **Quantum-Resistant Algorithms**: Post-quantum cryptography support\n3. **Federated KMS**: Multi-organization key sharing\n4. **Blockchain Integration**: Immutable audit trails\n\nThis hybrid KMS system provides a solid foundation for secure key management in the control center architecture,\nwith room for future enhancements and customization based on specific requirements. +# Hybrid Key Management System (KMS) + +A comprehensive hybrid KMS system built for the control center, supporting local/remote/hybrid modes +with intelligent caching, failover, and advanced security features. + +## Architecture Overview + +### Core Components + +1. **KMS Backends** + - **Local Backend**: SQLite with AES-256-GCM encryption + - **Remote Backend**: Cosmian KMS client integration + - **Hybrid Backend**: Intelligent combination of local and remote + +2. **Caching System** + - **Memory Cache**: In-memory LRU cache with TTL + - **Redis Cache**: Distributed caching with automatic failover + - **Local File Cache**: Persistent file-based cache for offline scenarios + +3. **Security Features** + - **Encryption**: AES-256-GCM for local storage + - **Key Derivation**: HKDF-SHA256 for master key derivation + - **Authentication**: Multiple auth methods (certificate, token, basic, OAuth) + - **Audit Logging**: Comprehensive audit trail for all operations + +4. **Advanced Features** + - **Credential Management**: Automatic injection for cloud providers + - **Key Rotation**: Configurable automatic key rotation + - **HSM Integration**: Hardware Security Module support + - **Zero-Knowledge Proofs**: For sensitive operations + - **Migration Tools**: Backend-to-backend key migration + +## Configuration + +### TOML Configuration Example + +```toml +[kms] +# Operation mode: local, remote, or hybrid +mode = "hybrid" + +# Local SQLite backend configuration +[kms.local] +database_path = "./data/kms.db" + +[kms.local.master_key] +derivation_method = "hkdf" +source = "generated" +iterations = 100000 + +[kms.local.encryption] +default_algorithm = "AES-256-GCM" +key_size_bits = 256 +authenticated = true + +[kms.local.backup] +enabled = true +backup_dir = "./backups/kms" +interval_hours = 24 +max_backups = 7 +compress = true +encrypt = true + +# Remote Cosmian KMS configuration +[kms.remote] +server_url = "https://kms.example.com:9998" +auth_method = "certificate" +client_cert_path = "./certs/client.crt" +client_key_path = "./certs/client.key" +ca_cert_path = "./certs/ca.crt" +timeout_seconds = 30 +verify_ssl = true + +[kms.remote.retry] +max_attempts = 3 +initial_delay_ms = 1000 +max_delay_ms = 30000 +backoff_multiplier = 2.0 + +# Cache configuration +[kms.cache] +enabled = true +backend = "memory" +default_ttl_seconds = 3600 +max_size_bytes = 104857600 # 100MB +local_dir = "./cache/kms" + +# Credential management +[kms.credentials] +enabled = true + +[kms.credentials.storage] +storage_type = "sqlite" +encryption_key_id = "credential_encryption_key" +database_path = "./data/credentials.db" + +[kms.credentials.providers.aws] +name = "AWS" +provider_type = "aws" +refresh_interval_seconds = 3600 +expiry_warning_seconds = 300 +regions = ["us-east-1", "eu-west-1"] + +[kms.credentials.providers.upcloud] +name = "UpCloud" +provider_type = "upcloud" +refresh_interval_seconds = 3600 +expiry_warning_seconds = 300 +regions = ["fi-hel1", "us-nyc1"] + +# Key rotation configuration +[kms.rotation] +enabled = true +interval_seconds = 2592000 # 30 days +max_age_seconds = 7776000 # 90 days +notice_seconds = 604800 # 7 days +schedule = "0 2 * * 0" # Every Sunday at 2 AM + +# Audit configuration +[kms.audit] +enabled = true +backend = "file" +retention_days = 90 +log_level = "info" +include_data = false +max_file_size_mb = 100 +format = "json" + +# HSM configuration +[kms.hsm] +enabled = false +hsm_type = "pkcs11" +pkcs11_library = "/usr/lib/libpkcs11.so" +slot_id = 0 + +# Zero-knowledge proof configuration +[kms.zkp] +enabled = false +proof_system = "groth16" +setup_params_path = "./zkp/setup.params" + +# Security policy +[kms.security] +require_strong_auth = true +min_key_length_bits = 256 +max_key_age_days = 90 +enable_pfs = true +allowed_algorithms = ["AES-256-GCM", "ChaCha20Poly1305", "RSA-4096", "ECDSA-P384"] +blocked_algorithms = ["DES", "3DES", "RC4", "MD5"] +policy_enforcement = "strict" +``` + +## Usage Examples + +### Basic KMS Operations + +```bash +use control_center::kms::{KmsManager, KmsConfig, KeyData, KeyType, KeyAlgorithm, KeyUsage}; + +// Initialize KMS manager +let config = KmsConfig::load_from_file("kms.toml").await?; +let mut kms = KmsManager::new(&config).await?; +kms.initialize().await?; + +// Create a new encryption key +let key_data = KeyData { + key_id: "my-encryption-key".to_string(), + key_type: KeyType::Symmetric, + algorithm: KeyAlgorithm::Aes256Gcm, + usage: KeyUsage { + encrypt: true, + decrypt: true, + ..Default::default() + }, + key_size: 256, + key_material: SecretBytes::new(generate_random_key(32)), + metadata: KeyMetadata { + name: Some("Application Encryption Key".to_string()), + description: Some("Key for encrypting application data".to_string()), + owner: Some("app-service".to_string()), + environment: Some("production".to_string()), + ..Default::default() + }, + created_at: Utc::now(), + last_accessed: None, + expires_at: Some(Utc::now() + chrono::Duration::days(90)), + status: KeyStatus::Active, + tags: HashMap::from([ + ("purpose".to_string(), "encryption".to_string()), + ("service".to_string(), "app".to_string()), + ]), +}; + +// Store the key +let stored_key_id = kms.store_key(key_data).await?; +println!("Key stored with ID: {}", stored_key_id); + +// Encrypt data +let plaintext = b"sensitive data to encrypt"; +let context = HashMap::from([ + ("service".to_string(), "app".to_string()), + ("version".to_string(), "1.0".to_string()), +]); +let ciphertext = kms.encrypt(&stored_key_id, plaintext, Some(context.clone())).await?; + +// Decrypt data +let decrypted = kms.decrypt(&stored_key_id, &ciphertext, Some(context)).await?; +assert_eq!(plaintext, decrypted.as_slice()); + +// Get key information +if let Some(key_info) = kms.get_key(&stored_key_id).await? { + println!("Key algorithm: {:?}", key_info.algorithm); + println!("Key status: {:?}", key_info.status); + println!("Created: {}", key_info.created_at); +} +``` + +### Provider Credential Management + +```bash +use control_center::kms::{ProviderCredentials, CredentialType}; + +// Store AWS credentials +let aws_creds = ProviderCredentials { + provider: "aws".to_string(), + credential_type: CredentialType::AccessKey, + access_key: "AKIAIOSFODNN7EXAMPLE".to_string(), + secret_key: SecretBytes::new(b"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY".to_vec()), + session_token: None, + region: Some("us-east-1".to_string()), + config: HashMap::new(), + expires_at: Some(Utc::now() + chrono::Duration::hours(12)), + created_at: Utc::now(), +}; + +kms.store_provider_credentials("aws", aws_creds).await?; + +// Retrieve credentials for automatic injection +if let Some(creds) = kms.get_provider_credentials("aws").await? { + println!("AWS Access Key: {}", creds.access_key); + // Credentials are automatically injected into environment variables + // or configuration files based on the injection configuration +} +``` + +### Health Monitoring + +```bash +// Check system health +let health = kms.health_check().await?; +println!("KMS Health: {}", health.overall); +println!("Backend Status: {}", health.backend.healthy); +println!("Rotation Status: {}", health.rotation.healthy); +println!("Credentials Status: {}", health.credentials.healthy); + +// Get cache statistics +let cache_stats = kms.cache.stats().await; +println!("Cache hit rate: {:.2}%", cache_stats.hit_rate() * 100.0); +println!("Cache entries: {}", cache_stats.entry_count); +``` + +## Integration with Existing System + +### Environment Variable Integration + +The KMS system integrates with the existing environment-based configuration: + +```toml +# Set KMS configuration environment +export PROVISIONING_KMS_MODE=hybrid +export PROVISIONING_KMS_LOCAL_DATABASE_PATH=/var/lib/provisioning/kms.db +export PROVISIONING_KMS_REMOTE_SERVER_URL=https://kms.example.com:9998 +export PROVISIONING_KMS_CACHE_ENABLED=true +``` + +### TOML Configuration Integration + +Add KMS configuration to your existing `config.defaults.toml`: + +```toml +# Extend existing configuration with KMS section +[kms] +mode = "local" # Start with local mode +local.database_path = "{{paths.base}}/data/kms.db" +cache.enabled = true +cache.local_dir = "{{paths.base}}/cache/kms" +audit.enabled = true +``` + +### Nushell Integration + +```nushell +# Load KMS configuration +def kms_config [] { + let config_path = $"($env.PROVISIONING_BASE)/config.toml" + open $config_path | get kms +} + +# Check KMS health +def kms_health [] { + http get http://localhost:8080/kms/health | from json +} + +# List available keys +def kms_keys [] { + http get http://localhost:8080/kms/keys | from json +} +``` + +## Security Considerations + +### Key Storage Security + +1. **Local Storage**: Keys are encrypted using AES-256-GCM with keys derived from a master key using HKDF +2. **Remote Storage**: Relies on Cosmian KMS security guarantees +3. **Cache**: Cached keys have TTL and are encrypted in transit +4. **Memory**: Key material is zeroed on drop using the `zeroize` crate + +### Access Control + +1. **Authentication**: Multiple authentication methods supported +2. **Authorization**: Key usage permissions enforced +3. **Audit**: All operations logged with detailed context +4. **Network**: TLS encryption for all remote communications + +### Compliance Features + +1. **Audit Trail**: Comprehensive logging of all KMS operations +2. **Key Rotation**: Automatic rotation with configurable policies +3. **Retention**: Configurable key and audit log retention +4. **Export Controls**: Keys can be marked as non-exportable + +## Deployment Guide + +### Production Deployment + +1. **Configuration**: + + ```toml + [kms] + mode = "hybrid" + + [kms.security] + policy_enforcement = "strict" + require_strong_auth = true + + [kms.audit] + enabled = true + backend = "database" + retention_days = 365 + ``` + +2. **Infrastructure**: + - Dedicated KMS database server + - Redis cluster for caching + - HSM for high-security keys + - Backup and disaster recovery + +3. **Monitoring**: + - Health check endpoints + - Metrics collection + - Alert on key expiration + - Audit log monitoring + +### Development Setup + +1. **Local Development**: + + ```toml + [kms] + mode = "local" + + [kms.security] + policy_enforcement = "permissive" + + [kms.audit] + enabled = true + backend = "stdout" + ``` + +2. **Testing**: + + ```bash + # Run with test configuration + PROVISIONING_ENV=test ./control-center + ``` + +## Performance Optimization + +### Caching Strategy + +1. **Multi-level Caching**: Memory → Redis → Local File +2. **TTL Management**: Configurable per key type +3. **Cache Warming**: Preload frequently used keys +4. **Eviction Policies**: LRU with size-based eviction + +### Connection Pooling + +1. **Database Connections**: Configurable pool size +2. **HTTP Connections**: Keep-alive and connection reuse +3. **Batch Operations**: Bulk key operations where supported + +## Troubleshooting + +### Common Issues + +1. **Connection Failures**: Check network connectivity and certificates +2. **Permission Errors**: Verify file system permissions for local storage +3. **Cache Misses**: Monitor cache hit rates and adjust TTL +4. **Key Rotation**: Check rotation scheduler logs + +### Debug Mode + +```bash +# Enable debug logging +export PROVISIONING_DEBUG=true +export PROVISIONING_LOG_LEVEL=debug + +# Run with verbose output +./control-center --debug +``` + +### Health Checks + +```bash +# Check KMS health +curl http://localhost:8080/kms/health + +# Check individual components +curl http://localhost:8080/kms/health/backend +curl http://localhost:8080/kms/health/cache +curl http://localhost:8080/kms/health/rotation +``` + +## Future Enhancements + +### Planned Features + +1. **Multi-Region Support**: Cross-region key replication +2. **Key Versioning**: Multiple versions per key with rollback +3. **Policy Engine**: Fine-grained access control policies +4. **Metrics Dashboard**: Web UI for monitoring and management +5. **Integration APIs**: REST and gRPC APIs for external systems + +### Experimental Features + +1. **Zero-Knowledge Proofs**: For privacy-preserving operations +2. **Quantum-Resistant Algorithms**: Post-quantum cryptography support +3. **Federated KMS**: Multi-organization key sharing +4. **Blockchain Integration**: Immutable audit trails + +This hybrid KMS system provides a solid foundation for secure key management in the control center architecture, +with room for future enhancements and customization based on specific requirements. \ No newline at end of file diff --git a/crates/control-center/web/README.md b/crates/control-center/web/README.md index 287e05d..91ca820 100644 --- a/crates/control-center/web/README.md +++ b/crates/control-center/web/README.md @@ -1 +1,180 @@ -# Control Center Web UI\n\nReact/TypeScript frontend for the Control Center vault secrets management.\n\n## Features\n\n- **Secrets List**: Browse and filter vault secrets\n- **Secret View**: View secret details with show/hide value toggle\n- **Secret Create/Edit**: Create new secrets or update existing ones\n- **Secret History**: View version history and restore previous versions\n- **Copy to Clipboard**: Easy copy functionality for secret values\n- **Responsive Design**: Works on desktop and mobile devices\n\n## Components\n\n### Core Components\n\n- **SecretsManager**: Main orchestrator component\n- **SecretsList**: List view with pagination and filtering\n- **SecretView**: Detailed secret view with metadata\n- **SecretCreate**: Create/edit form for secrets\n- **SecretHistory**: Version history with restore functionality\n\n### API Client\n\n- **secretsApi**: HTTP client for vault secrets endpoints\n- Type-safe request/response handling\n- Error handling with custom error types\n\n## Prerequisites\n\n- Node.js 18+\n- npm or yarn\n- Control Center backend running on \n\n## Installation\n\n```{$detected_lang}\ncd provisioning/platform/control-center/web\nnpm install\n```\n\n## Development\n\n```{$detected_lang}\n# Start development server\nnpm start\n\n# Build for production\nnpm build\n\n# Run tests\nnpm test\n\n# Lint code\nnpm run lint\n\n# Format code\nnpm run format\n```\n\n## Environment Variables\n\nCreate a `.env` file in the web directory:\n\n```{$detected_lang}\nREACT_APP_API_URL=http://localhost:8080\n```\n\n## Usage\n\n### Import and Use\n\n```{$detected_lang}\nimport { SecretsManager } from './components/secrets';\n\nfunction App() {\n return (\n
\n \n
\n );\n}\n```\n\n### API Client\n\n```{$detected_lang}\nimport { secretsApi } from './api/secrets';\n\n// Create a secret\nconst secret = await secretsApi.createSecret({\n path: 'database/prod/password',\n value: 'my-secret-value',\n context: 'production',\n metadata: { description: 'Production DB password' },\n});\n\n// Get a secret\nconst secretData = await secretsApi.getSecret('database/prod/password');\n\n// List secrets\nconst { secrets, total } = await secretsApi.listSecrets({\n prefix: 'database/',\n limit: 50,\n offset: 0,\n});\n\n// Update secret\nawait secretsApi.updateSecret('database/prod/password', {\n value: 'new-secret-value',\n});\n\n// Delete secret\nawait secretsApi.deleteSecret('database/prod/password');\n\n// Get history\nconst history = await secretsApi.getSecretHistory('database/prod/password');\n\n// Restore version\nawait secretsApi.restoreSecretVersion('database/prod/password', 2);\n```\n\n## Architecture\n\n```{$detected_lang}\nSecretsManager (Orchestrator)\n ├── SecretsList (Browse)\n ├── SecretView (Detail)\n ├── SecretCreate (Create/Edit)\n └── SecretHistory (Versions)\n ↓\n secretsApi (HTTP Client)\n ↓\n Control Center Backend API\n ↓\n KMS Service (Encryption)\n ↓\n RustyVault (Storage)\n```\n\n## Security\n\n- **MFA Required**: All secret operations require MFA verification\n- **RBAC**: Role-based access control enforced by backend\n- **Encrypted Storage**: Values encrypted via KMS Service before storage\n- **Audit Trail**: All operations logged for compliance\n- **No Plaintext**: Values never stored unencrypted\n- **Context Encryption**: Optional AAD for additional security\n\n## TypeScript Types\n\nAll components are fully typed. See `src/types/secrets.ts` for type definitions:\n\n- `Secret`: Secret metadata\n- `SecretWithValue`: Secret with decrypted value\n- `SecretVersion`: Version information\n- `SecretHistory`: Complete version history\n- `CreateSecretRequest`: Create request payload\n- `UpdateSecretRequest`: Update request payload\n- `ListSecretsQuery`: List query parameters\n- `ApiError`: Error response type\n\n## Styling\n\nCustom CSS in `src/components/secrets/secrets.css`. Modify to match your design system.\n\n## Browser Support\n\n- Chrome/Edge 90+\n- Firefox 88+\n- Safari 14+\n\n## License\n\nSee project root LICENSE file.\n\n## Contributing\n\nSee project root CONTRIBUTING.md for contribution guidelines. +# Control Center Web UI + +React/TypeScript frontend for the Control Center vault secrets management. + +## Features + +- **Secrets List**: Browse and filter vault secrets +- **Secret View**: View secret details with show/hide value toggle +- **Secret Create/Edit**: Create new secrets or update existing ones +- **Secret History**: View version history and restore previous versions +- **Copy to Clipboard**: Easy copy functionality for secret values +- **Responsive Design**: Works on desktop and mobile devices + +## Components + +### Core Components + +- **SecretsManager**: Main orchestrator component +- **SecretsList**: List view with pagination and filtering +- **SecretView**: Detailed secret view with metadata +- **SecretCreate**: Create/edit form for secrets +- **SecretHistory**: Version history with restore functionality + +### API Client + +- **secretsApi**: HTTP client for vault secrets endpoints +- Type-safe request/response handling +- Error handling with custom error types + +## Prerequisites + +- Node.js 18+ +- npm or yarn +- Control Center backend running on + +## Installation + +```bash +cd provisioning/platform/control-center/web +npm install +``` + +## Development + +```bash +# Start development server +npm start + +# Build for production +npm build + +# Run tests +npm test + +# Lint code +npm run lint + +# Format code +npm run format +``` + +## Environment Variables + +Create a `.env` file in the web directory: + +```bash +REACT_APP_API_URL=http://localhost:8080 +``` + +## Usage + +### Import and Use + +```bash +import { SecretsManager } from './components/secrets'; + +function App() { + return ( +
+ +
+ ); +} +``` + +### API Client + +```bash +import { secretsApi } from './api/secrets'; + +// Create a secret +const secret = await secretsApi.createSecret({ + path: 'database/prod/password', + value: 'my-secret-value', + context: 'production', + metadata: { description: 'Production DB password' }, +}); + +// Get a secret +const secretData = await secretsApi.getSecret('database/prod/password'); + +// List secrets +const { secrets, total } = await secretsApi.listSecrets({ + prefix: 'database/', + limit: 50, + offset: 0, +}); + +// Update secret +await secretsApi.updateSecret('database/prod/password', { + value: 'new-secret-value', +}); + +// Delete secret +await secretsApi.deleteSecret('database/prod/password'); + +// Get history +const history = await secretsApi.getSecretHistory('database/prod/password'); + +// Restore version +await secretsApi.restoreSecretVersion('database/prod/password', 2); +``` + +## Architecture + +```bash +SecretsManager (Orchestrator) + ├── SecretsList (Browse) + ├── SecretView (Detail) + ├── SecretCreate (Create/Edit) + └── SecretHistory (Versions) + ↓ + secretsApi (HTTP Client) + ↓ + Control Center Backend API + ↓ + KMS Service (Encryption) + ↓ + RustyVault (Storage) +``` + +## Security + +- **MFA Required**: All secret operations require MFA verification +- **RBAC**: Role-based access control enforced by backend +- **Encrypted Storage**: Values encrypted via KMS Service before storage +- **Audit Trail**: All operations logged for compliance +- **No Plaintext**: Values never stored unencrypted +- **Context Encryption**: Optional AAD for additional security + +## TypeScript Types + +All components are fully typed. See `src/types/secrets.ts` for type definitions: + +- `Secret`: Secret metadata +- `SecretWithValue`: Secret with decrypted value +- `SecretVersion`: Version information +- `SecretHistory`: Complete version history +- `CreateSecretRequest`: Create request payload +- `UpdateSecretRequest`: Update request payload +- `ListSecretsQuery`: List query parameters +- `ApiError`: Error response type + +## Styling + +Custom CSS in `src/components/secrets/secrets.css`. Modify to match your design system. + +## Browser Support + +- Chrome/Edge 90+ +- Firefox 88+ +- Safari 14+ + +## License + +See project root LICENSE file. + +## Contributing + +See project root CONTRIBUTING.md for contribution guidelines. \ No newline at end of file diff --git a/crates/extension-registry/API.md b/crates/extension-registry/API.md index 1807cc2..f7cbbd1 100644 --- a/crates/extension-registry/API.md +++ b/crates/extension-registry/API.md @@ -1 +1,586 @@ -# Extension Registry API Documentation\n\nVersion: 1.0.0\nBase URL: `http://localhost:8082/api/v1`\n\n## Table of Contents\n\n- [Authentication](#authentication)\n- [Extension Endpoints](#extension-endpoints)\n- [System Endpoints](#system-endpoints)\n- [Error Responses](#error-responses)\n- [Data Models](#data-models)\n\n## Authentication\n\nThe Extension Registry API does not require authentication for read operations. Backend authentication (Gitea/OCI) is handled server-side via \nconfiguration.\n\n## Extension Endpoints\n\n### List Extensions\n\nRetrieve a list of available extensions with optional filtering and pagination.\n\n**Endpoint**: `GET /extensions`\n\n**Query Parameters**:\n\n| Parameter | Type | Required | Description |\n| ----------- | ------ | ---------- | ------------- |\n| `type` | string | No | Filter by extension type: `provider`, `taskserv`, `cluster` |\n| `source` | string | No | Filter by source: `gitea`, `oci` |\n| `limit` | integer | No | Maximum results (default: 100, max: 1000) |\n| `offset` | integer | No | Pagination offset (default: 0) |\n\n**Example Request**:\n\n```{$detected_lang}\ncurl "http://localhost:8082/api/v1/extensions?type=provider&limit=10"\n```\n\n**Example Response** (200 OK):\n\n```{$detected_lang}\n[\n {\n "name": "aws",\n "type": "provider",\n "version": "1.2.0",\n "description": "AWS provider for provisioning infrastructure",\n "author": "provisioning-team",\n "repository": "https://gitea.example.com/org/aws_prov",\n "source": "gitea",\n "published_at": "2025-10-06T12:00:00Z",\n "download_url": "https://gitea.example.com/org/aws_prov/releases/download/1.2.0/aws_prov.tar.gz",\n "checksum": "sha256:abc123...",\n "size": 1024000,\n "tags": ["cloud", "aws", "infrastructure"]\n },\n {\n "name": "upcloud",\n "type": "provider",\n "version": "2.1.3",\n "description": "UpCloud provider for European cloud infrastructure",\n "author": "provisioning-team",\n "repository": "https://gitea.example.com/org/upcloud_prov",\n "source": "gitea",\n "published_at": "2025-10-05T10:30:00Z",\n "download_url": "https://gitea.example.com/org/upcloud_prov/releases/download/2.1.3/upcloud_prov.tar.gz",\n "size": 890000\n }\n]\n```\n\n---\n\n### Get Extension Metadata\n\nRetrieve detailed metadata for a specific extension.\n\n**Endpoint**: `GET /extensions/{type}/{name}`\n\n**Path Parameters**:\n\n| Parameter | Type | Required | Description |\n| ----------- | ------ | ---------- | ------------- |\n| `type` | string | Yes | Extension type: `provider`, `taskserv`, `cluster` |\n| `name` | string | Yes | Extension name |\n\n**Example Request**:\n\n```{$detected_lang}\ncurl "http://localhost:8082/api/v1/extensions/provider/aws"\n```\n\n**Example Response** (200 OK):\n\n```{$detected_lang}\n{\n "name": "aws",\n "type": "provider",\n "version": "1.2.0",\n "description": "AWS provider for provisioning infrastructure",\n "author": "provisioning-team",\n "repository": "https://gitea.example.com/org/aws_prov",\n "source": "gitea",\n "published_at": "2025-10-06T12:00:00Z",\n "download_url": "https://gitea.example.com/org/aws_prov/releases/download/1.2.0/aws_prov.tar.gz",\n "checksum": "sha256:abc123...",\n "size": 1024000,\n "tags": ["cloud", "aws", "infrastructure"]\n}\n```\n\n**Error Response** (404 Not Found):\n\n```{$detected_lang}\n{\n "error": "not_found",\n "message": "Extension provider/nonexistent not found"\n}\n```\n\n---\n\n### List Extension Versions\n\nGet all available versions for a specific extension.\n\n**Endpoint**: `GET /extensions/{type}/{name}/versions`\n\n**Path Parameters**:\n\n| Parameter | Type | Required | Description |\n| ----------- | ------ | ---------- | ------------- |\n| `type` | string | Yes | Extension type: `provider`, `taskserv`, `cluster` |\n| `name` | string | Yes | Extension name |\n\n**Example Request**:\n\n```{$detected_lang}\ncurl "http://localhost:8082/api/v1/extensions/taskserv/kubernetes/versions"\n```\n\n**Example Response** (200 OK):\n\n```{$detected_lang}\n[\n {\n "version": "1.28.0",\n "published_at": "2025-10-06T12:00:00Z",\n "download_url": "https://gitea.example.com/org/kubernetes_taskserv/releases/download/1.28.0/kubernetes_taskserv.tar.gz",\n "checksum": "sha256:def456...",\n "size": 2048000\n },\n {\n "version": "1.27.5",\n "published_at": "2025-09-15T10:30:00Z",\n "download_url": "https://gitea.example.com/org/kubernetes_taskserv/releases/download/1.27.5/kubernetes_taskserv.tar.gz",\n "checksum": "sha256:ghi789...",\n "size": 1980000\n },\n {\n "version": "1.27.4",\n "published_at": "2025-08-20T08:15:00Z",\n "download_url": "https://gitea.example.com/org/kubernetes_taskserv/releases/download/1.27.4/kubernetes_taskserv.tar.gz",\n "size": 1950000\n }\n]\n```\n\n---\n\n### Download Extension\n\nDownload a specific version of an extension.\n\n**Endpoint**: `GET /extensions/{type}/{name}/{version}`\n\n**Path Parameters**:\n\n| Parameter | Type | Required | Description |\n| ----------- | ------ | ---------- | ------------- |\n| `type` | string | Yes | Extension type: `provider`, `taskserv`, `cluster` |\n| `name` | string | Yes | Extension name |\n| `version` | string | Yes | Extension version (e.g., `1.2.0`) |\n\n**Example Request**:\n\n```{$detected_lang}\ncurl -OJ "http://localhost:8082/api/v1/extensions/provider/aws/1.2.0"\n```\n\n**Response**:\n\n- **Content-Type**: `application/octet-stream`\n- **Body**: Binary data (tarball or archive)\n\n**Error Response** (404 Not Found):\n\n```{$detected_lang}\n{\n "error": "not_found",\n "message": "Extension provider/aws version 1.2.0 not found"\n}\n```\n\n---\n\n### Search Extensions\n\nSearch for extensions by name or description.\n\n**Endpoint**: `GET /extensions/search`\n\n**Query Parameters**:\n\n| Parameter | Type | Required | Description |\n| ----------- | ------ | ---------- | ------------- |\n| `q` | string | Yes | Search query (case-insensitive) |\n| `type` | string | No | Filter by extension type |\n| `limit` | integer | No | Maximum results (default: 50, max: 100) |\n\n**Example Request**:\n\n```{$detected_lang}\ncurl "http://localhost:8082/api/v1/extensions/search?q=kubernetes&type=taskserv&limit=5"\n```\n\n**Example Response** (200 OK):\n\n```{$detected_lang}\n[\n {\n "name": "kubernetes",\n "type": "taskserv",\n "version": "1.28.0",\n "description": "Kubernetes container orchestration platform",\n "author": "provisioning-team",\n "source": "gitea",\n "published_at": "2025-10-06T12:00:00Z"\n },\n {\n "name": "k3s",\n "type": "taskserv",\n "version": "1.27.5",\n "description": "Lightweight Kubernetes distribution",\n "author": "community",\n "source": "oci",\n "published_at": "2025-09-20T14:30:00Z"\n }\n]\n```\n\n---\n\n## System Endpoints\n\n### Health Check\n\nCheck service health and backend status.\n\n**Endpoint**: `GET /health`\n\n**Example Request**:\n\n```{$detected_lang}\ncurl "http://localhost:8082/api/v1/health"\n```\n\n**Example Response** (200 OK):\n\n```{$detected_lang}\n{\n "status": "healthy",\n "version": "0.1.0",\n "uptime": 3600,\n "backends": {\n "gitea": {\n "enabled": true,\n "healthy": true\n },\n "oci": {\n "enabled": true,\n "healthy": true,\n "error": null\n }\n }\n}\n```\n\n**Degraded Status** (200 OK):\n\n```{$detected_lang}\n{\n "status": "degraded",\n "version": "0.1.0",\n "uptime": 7200,\n "backends": {\n "gitea": {\n "enabled": true,\n "healthy": false,\n "error": "Connection timeout"\n },\n "oci": {\n "enabled": true,\n "healthy": true\n }\n }\n}\n```\n\n---\n\n### Metrics\n\nGet Prometheus-formatted metrics.\n\n**Endpoint**: `GET /metrics`\n\n**Example Request**:\n\n```{$detected_lang}\ncurl "http://localhost:8082/api/v1/metrics"\n```\n\n**Example Response** (200 OK):\n\n```{$detected_lang}\n# HELP http_requests_total Total HTTP requests\n# TYPE http_requests_total counter\nhttp_requests_total 1234\n\n# HELP http_request_duration_seconds HTTP request duration\n# TYPE http_request_duration_seconds histogram\nhttp_request_duration_seconds_bucket{le="0.005"} 100\nhttp_request_duration_seconds_bucket{le="0.01"} 200\nhttp_request_duration_seconds_bucket{le="0.025"} 300\nhttp_request_duration_seconds_sum 50.5\nhttp_request_duration_seconds_count 1234\n\n# HELP cache_hits_total Total cache hits\n# TYPE cache_hits_total counter\ncache_hits_total 987\n\n# HELP cache_misses_total Total cache misses\n# TYPE cache_misses_total counter\ncache_misses_total 247\n\n# HELP extensions_total Total extensions\n# TYPE extensions_total gauge\nextensions_total 45\n```\n\n---\n\n### Cache Statistics\n\nGet cache performance statistics.\n\n**Endpoint**: `GET /cache/stats`\n\n**Example Request**:\n\n```{$detected_lang}\ncurl "http://localhost:8082/api/v1/cache/stats"\n```\n\n**Example Response** (200 OK):\n\n```{$detected_lang}\n{\n "list_entries": 45,\n "metadata_entries": 120,\n "version_entries": 80,\n "total_entries": 245\n}\n```\n\n---\n\n## Error Responses\n\nAll error responses follow this format:\n\n```{$detected_lang}\n{\n "error": "error_type",\n "message": "Human-readable error message",\n "details": "Optional additional details"\n}\n```\n\n### HTTP Status Codes\n\n| Status | Description |\n| -------- | ------------- |\n| 200 OK | Request successful |\n| 400 Bad Request | Invalid input (e.g., invalid extension type) |\n| 401 Unauthorized | Authentication failed |\n| 404 Not Found | Resource not found |\n| 429 Too Many Requests | Rate limit exceeded |\n| 500 Internal Server Error | Server error |\n| 503 Service Unavailable | Service temporarily unavailable |\n\n### Error Types\n\n| Error Type | HTTP Status | Description |\n| ------------ | ------------- | ------------- |\n| `not_found` | 404 | Extension or resource not found |\n| `invalid_type` | 400 | Invalid extension type provided |\n| `invalid_version` | 400 | Invalid version format |\n| `auth_error` | 401 | Authentication failed |\n| `rate_limit` | 429 | Too many requests |\n| `config_error` | 500 | Server configuration error |\n| `internal_error` | 500 | Internal server error |\n\n---\n\n## Data Models\n\n### Extension\n\n```{$detected_lang}\ninterface Extension {\n name: string; // Extension name\n type: ExtensionType; // Extension type\n version: string; // Current version (semver)\n description: string; // Description\n author?: string; // Author/organization\n repository?: string; // Source repository URL\n source: ExtensionSource; // Backend source\n published_at: string; // ISO 8601 timestamp\n download_url?: string; // Download URL\n checksum?: string; // Checksum (e.g., sha256:...)\n size?: number; // Size in bytes\n tags?: string[]; // Tags\n}\n```\n\n### ExtensionType\n\n```{$detected_lang}\ntype ExtensionType = "provider" | "taskserv" | "cluster";\n```\n\n### ExtensionSource\n\n```{$detected_lang}\ntype ExtensionSource = "gitea" | "oci";\n```\n\n### ExtensionVersion\n\n```{$detected_lang}\ninterface ExtensionVersion {\n version: string; // Version string (semver)\n published_at: string; // ISO 8601 timestamp\n download_url?: string; // Download URL\n checksum?: string; // Checksum\n size?: number; // Size in bytes\n}\n```\n\n### HealthResponse\n\n```{$detected_lang}\ninterface HealthResponse {\n status: string; // "healthy" | "degraded"\n version: string; // Service version\n uptime: number; // Uptime in seconds\n backends: BackendHealth; // Backend health status\n}\n```\n\n### BackendHealth\n\n```{$detected_lang}\ninterface BackendHealth {\n gitea: BackendStatus;\n oci: BackendStatus;\n}\n```\n\n### BackendStatus\n\n```{$detected_lang}\ninterface BackendStatus {\n enabled: boolean; // Backend enabled\n healthy: boolean; // Backend healthy\n error?: string; // Error message if unhealthy\n}\n```\n\n---\n\n## Rate Limiting\n\nCurrently, the API does not enforce rate limiting. This may be added in future versions.\n\nFor high-volume usage, consider:\n\n- Implementing client-side rate limiting\n- Using the cache effectively\n- Batching requests when possible\n\n---\n\n## Caching Behavior\n\nThe service implements LRU caching with TTL:\n\n- **Cache TTL**: Configurable (default: 5 minutes)\n- **Cache Capacity**: Configurable (default: 1000 entries)\n- **Cache Keys**:\n - List: `list:{type}:{source}`\n - Metadata: `{type}/{name}`\n - Versions: `{type}/{name}/versions`\n\nCache headers are not currently exposed. Future versions may include:\n\n- `X-Cache-Hit: true/false`\n- `X-Cache-TTL: {seconds}`\n\n---\n\n## Versioning\n\nAPI version is specified in the URL path: `/api/v1/`\n\nMajor version changes will be introduced in new paths (e.g., `/api/v2/`).\n\n---\n\n## Examples\n\n### Complete Workflow\n\n```{$detected_lang}\n# 1. Search for Kubernetes extensions\ncurl "http://localhost:8082/api/v1/extensions/search?q=kubernetes"\n\n# 2. Get extension metadata\ncurl "http://localhost:8082/api/v1/extensions/taskserv/kubernetes"\n\n# 3. List available versions\ncurl "http://localhost:8082/api/v1/extensions/taskserv/kubernetes/versions"\n\n# 4. Download specific version\ncurl -OJ "http://localhost:8082/api/v1/extensions/taskserv/kubernetes/1.28.0"\n\n# 5. Verify checksum (if provided)\nsha256sum kubernetes_taskserv.tar.gz\n```\n\n### Pagination\n\n```{$detected_lang}\n# Get first page\ncurl "http://localhost:8082/api/v1/extensions?limit=10&offset=0"\n\n# Get second page\ncurl "http://localhost:8082/api/v1/extensions?limit=10&offset=10"\n\n# Get third page\ncurl "http://localhost:8082/api/v1/extensions?limit=10&offset=20"\n```\n\n### Filtering\n\n```{$detected_lang}\n# Only providers from Gitea\ncurl "http://localhost:8082/api/v1/extensions?type=provider&source=gitea"\n\n# Only taskservs from OCI\ncurl "http://localhost:8082/api/v1/extensions?type=taskserv&source=oci"\n\n# All clusters\ncurl "http://localhost:8082/api/v1/extensions?type=cluster"\n```\n\n---\n\n## Support\n\nFor issues and questions, see the main README or project documentation. +# Extension Registry API Documentation + +Version: 1.0.0 +Base URL: `http://localhost:8082/api/v1` + +## Table of Contents + +- [Authentication](#authentication) +- [Extension Endpoints](#extension-endpoints) +- [System Endpoints](#system-endpoints) +- [Error Responses](#error-responses) +- [Data Models](#data-models) + +## Authentication + +The Extension Registry API does not require authentication for read operations. Backend authentication (Gitea/OCI) is handled server-side via +configuration. + +## Extension Endpoints + +### List Extensions + +Retrieve a list of available extensions with optional filtering and pagination. + +**Endpoint**: `GET /extensions` + +**Query Parameters**: + +| Parameter | Type | Required | Description | +| ----------- | ------ | ---------- | ------------- | +| `type` | string | No | Filter by extension type: `provider`, `taskserv`, `cluster` | +| `source` | string | No | Filter by source: `gitea`, `oci` | +| `limit` | integer | No | Maximum results (default: 100, max: 1000) | +| `offset` | integer | No | Pagination offset (default: 0) | + +**Example Request**: + +```bash +curl "http://localhost:8082/api/v1/extensions?type=provider&limit=10" +``` + +**Example Response** (200 OK): + +```bash +[ + { + "name": "aws", + "type": "provider", + "version": "1.2.0", + "description": "AWS provider for provisioning infrastructure", + "author": "provisioning-team", + "repository": "https://gitea.example.com/org/aws_prov", + "source": "gitea", + "published_at": "2025-10-06T12:00:00Z", + "download_url": "https://gitea.example.com/org/aws_prov/releases/download/1.2.0/aws_prov.tar.gz", + "checksum": "sha256:abc123...", + "size": 1024000, + "tags": ["cloud", "aws", "infrastructure"] + }, + { + "name": "upcloud", + "type": "provider", + "version": "2.1.3", + "description": "UpCloud provider for European cloud infrastructure", + "author": "provisioning-team", + "repository": "https://gitea.example.com/org/upcloud_prov", + "source": "gitea", + "published_at": "2025-10-05T10:30:00Z", + "download_url": "https://gitea.example.com/org/upcloud_prov/releases/download/2.1.3/upcloud_prov.tar.gz", + "size": 890000 + } +] +``` + +--- + +### Get Extension Metadata + +Retrieve detailed metadata for a specific extension. + +**Endpoint**: `GET /extensions/{type}/{name}` + +**Path Parameters**: + +| Parameter | Type | Required | Description | +| ----------- | ------ | ---------- | ------------- | +| `type` | string | Yes | Extension type: `provider`, `taskserv`, `cluster` | +| `name` | string | Yes | Extension name | + +**Example Request**: + +```bash +curl "http://localhost:8082/api/v1/extensions/provider/aws" +``` + +**Example Response** (200 OK): + +```json +{ + "name": "aws", + "type": "provider", + "version": "1.2.0", + "description": "AWS provider for provisioning infrastructure", + "author": "provisioning-team", + "repository": "https://gitea.example.com/org/aws_prov", + "source": "gitea", + "published_at": "2025-10-06T12:00:00Z", + "download_url": "https://gitea.example.com/org/aws_prov/releases/download/1.2.0/aws_prov.tar.gz", + "checksum": "sha256:abc123...", + "size": 1024000, + "tags": ["cloud", "aws", "infrastructure"] +} +``` + +**Error Response** (404 Not Found): + +```json +{ + "error": "not_found", + "message": "Extension provider/nonexistent not found" +} +``` + +--- + +### List Extension Versions + +Get all available versions for a specific extension. + +**Endpoint**: `GET /extensions/{type}/{name}/versions` + +**Path Parameters**: + +| Parameter | Type | Required | Description | +| ----------- | ------ | ---------- | ------------- | +| `type` | string | Yes | Extension type: `provider`, `taskserv`, `cluster` | +| `name` | string | Yes | Extension name | + +**Example Request**: + +```bash +curl "http://localhost:8082/api/v1/extensions/taskserv/kubernetes/versions" +``` + +**Example Response** (200 OK): + +```bash +[ + { + "version": "1.28.0", + "published_at": "2025-10-06T12:00:00Z", + "download_url": "https://gitea.example.com/org/kubernetes_taskserv/releases/download/1.28.0/kubernetes_taskserv.tar.gz", + "checksum": "sha256:def456...", + "size": 2048000 + }, + { + "version": "1.27.5", + "published_at": "2025-09-15T10:30:00Z", + "download_url": "https://gitea.example.com/org/kubernetes_taskserv/releases/download/1.27.5/kubernetes_taskserv.tar.gz", + "checksum": "sha256:ghi789...", + "size": 1980000 + }, + { + "version": "1.27.4", + "published_at": "2025-08-20T08:15:00Z", + "download_url": "https://gitea.example.com/org/kubernetes_taskserv/releases/download/1.27.4/kubernetes_taskserv.tar.gz", + "size": 1950000 + } +] +``` + +--- + +### Download Extension + +Download a specific version of an extension. + +**Endpoint**: `GET /extensions/{type}/{name}/{version}` + +**Path Parameters**: + +| Parameter | Type | Required | Description | +| ----------- | ------ | ---------- | ------------- | +| `type` | string | Yes | Extension type: `provider`, `taskserv`, `cluster` | +| `name` | string | Yes | Extension name | +| `version` | string | Yes | Extension version (e.g., `1.2.0`) | + +**Example Request**: + +```bash +curl -OJ "http://localhost:8082/api/v1/extensions/provider/aws/1.2.0" +``` + +**Response**: + +- **Content-Type**: `application/octet-stream` +- **Body**: Binary data (tarball or archive) + +**Error Response** (404 Not Found): + +```json +{ + "error": "not_found", + "message": "Extension provider/aws version 1.2.0 not found" +} +``` + +--- + +### Search Extensions + +Search for extensions by name or description. + +**Endpoint**: `GET /extensions/search` + +**Query Parameters**: + +| Parameter | Type | Required | Description | +| ----------- | ------ | ---------- | ------------- | +| `q` | string | Yes | Search query (case-insensitive) | +| `type` | string | No | Filter by extension type | +| `limit` | integer | No | Maximum results (default: 50, max: 100) | + +**Example Request**: + +```bash +curl "http://localhost:8082/api/v1/extensions/search?q=kubernetes&type=taskserv&limit=5" +``` + +**Example Response** (200 OK): + +```bash +[ + { + "name": "kubernetes", + "type": "taskserv", + "version": "1.28.0", + "description": "Kubernetes container orchestration platform", + "author": "provisioning-team", + "source": "gitea", + "published_at": "2025-10-06T12:00:00Z" + }, + { + "name": "k3s", + "type": "taskserv", + "version": "1.27.5", + "description": "Lightweight Kubernetes distribution", + "author": "community", + "source": "oci", + "published_at": "2025-09-20T14:30:00Z" + } +] +``` + +--- + +## System Endpoints + +### Health Check + +Check service health and backend status. + +**Endpoint**: `GET /health` + +**Example Request**: + +```bash +curl "http://localhost:8082/api/v1/health" +``` + +**Example Response** (200 OK): + +```json +{ + "status": "healthy", + "version": "0.1.0", + "uptime": 3600, + "backends": { + "gitea": { + "enabled": true, + "healthy": true + }, + "oci": { + "enabled": true, + "healthy": true, + "error": null + } + } +} +``` + +**Degraded Status** (200 OK): + +```json +{ + "status": "degraded", + "version": "0.1.0", + "uptime": 7200, + "backends": { + "gitea": { + "enabled": true, + "healthy": false, + "error": "Connection timeout" + }, + "oci": { + "enabled": true, + "healthy": true + } + } +} +``` + +--- + +### Metrics + +Get Prometheus-formatted metrics. + +**Endpoint**: `GET /metrics` + +**Example Request**: + +```bash +curl "http://localhost:8082/api/v1/metrics" +``` + +**Example Response** (200 OK): + +```bash +# HELP http_requests_total Total HTTP requests +# TYPE http_requests_total counter +http_requests_total 1234 + +# HELP http_request_duration_seconds HTTP request duration +# TYPE http_request_duration_seconds histogram +http_request_duration_seconds_bucket{le="0.005"} 100 +http_request_duration_seconds_bucket{le="0.01"} 200 +http_request_duration_seconds_bucket{le="0.025"} 300 +http_request_duration_seconds_sum 50.5 +http_request_duration_seconds_count 1234 + +# HELP cache_hits_total Total cache hits +# TYPE cache_hits_total counter +cache_hits_total 987 + +# HELP cache_misses_total Total cache misses +# TYPE cache_misses_total counter +cache_misses_total 247 + +# HELP extensions_total Total extensions +# TYPE extensions_total gauge +extensions_total 45 +``` + +--- + +### Cache Statistics + +Get cache performance statistics. + +**Endpoint**: `GET /cache/stats` + +**Example Request**: + +```bash +curl "http://localhost:8082/api/v1/cache/stats" +``` + +**Example Response** (200 OK): + +```json +{ + "list_entries": 45, + "metadata_entries": 120, + "version_entries": 80, + "total_entries": 245 +} +``` + +--- + +## Error Responses + +All error responses follow this format: + +```json +{ + "error": "error_type", + "message": "Human-readable error message", + "details": "Optional additional details" +} +``` + +### HTTP Status Codes + +| Status | Description | +| -------- | ------------- | +| 200 OK | Request successful | +| 400 Bad Request | Invalid input (e.g., invalid extension type) | +| 401 Unauthorized | Authentication failed | +| 404 Not Found | Resource not found | +| 429 Too Many Requests | Rate limit exceeded | +| 500 Internal Server Error | Server error | +| 503 Service Unavailable | Service temporarily unavailable | + +### Error Types + +| Error Type | HTTP Status | Description | +| ------------ | ------------- | ------------- | +| `not_found` | 404 | Extension or resource not found | +| `invalid_type` | 400 | Invalid extension type provided | +| `invalid_version` | 400 | Invalid version format | +| `auth_error` | 401 | Authentication failed | +| `rate_limit` | 429 | Too many requests | +| `config_error` | 500 | Server configuration error | +| `internal_error` | 500 | Internal server error | + +--- + +## Data Models + +### Extension + +```bash +interface Extension { + name: string; // Extension name + type: ExtensionType; // Extension type + version: string; // Current version (semver) + description: string; // Description + author?: string; // Author/organization + repository?: string; // Source repository URL + source: ExtensionSource; // Backend source + published_at: string; // ISO 8601 timestamp + download_url?: string; // Download URL + checksum?: string; // Checksum (e.g., sha256:...) + size?: number; // Size in bytes + tags?: string[]; // Tags +} +``` + +### ExtensionType + +```bash +type ExtensionType = "provider" | "taskserv" | "cluster"; +``` + +### ExtensionSource + +```bash +type ExtensionSource = "gitea" | "oci"; +``` + +### ExtensionVersion + +```bash +interface ExtensionVersion { + version: string; // Version string (semver) + published_at: string; // ISO 8601 timestamp + download_url?: string; // Download URL + checksum?: string; // Checksum + size?: number; // Size in bytes +} +``` + +### HealthResponse + +```bash +interface HealthResponse { + status: string; // "healthy" | "degraded" + version: string; // Service version + uptime: number; // Uptime in seconds + backends: BackendHealth; // Backend health status +} +``` + +### BackendHealth + +```bash +interface BackendHealth { + gitea: BackendStatus; + oci: BackendStatus; +} +``` + +### BackendStatus + +```bash +interface BackendStatus { + enabled: boolean; // Backend enabled + healthy: boolean; // Backend healthy + error?: string; // Error message if unhealthy +} +``` + +--- + +## Rate Limiting + +Currently, the API does not enforce rate limiting. This may be added in future versions. + +For high-volume usage, consider: + +- Implementing client-side rate limiting +- Using the cache effectively +- Batching requests when possible + +--- + +## Caching Behavior + +The service implements LRU caching with TTL: + +- **Cache TTL**: Configurable (default: 5 minutes) +- **Cache Capacity**: Configurable (default: 1000 entries) +- **Cache Keys**: + - List: `list:{type}:{source}` + - Metadata: `{type}/{name}` + - Versions: `{type}/{name}/versions` + +Cache headers are not currently exposed. Future versions may include: + +- `X-Cache-Hit: true/false` +- `X-Cache-TTL: {seconds}` + +--- + +## Versioning + +API version is specified in the URL path: `/api/v1/` + +Major version changes will be introduced in new paths (e.g., `/api/v2/`). + +--- + +## Examples + +### Complete Workflow + +```bash +# 1. Search for Kubernetes extensions +curl "http://localhost:8082/api/v1/extensions/search?q=kubernetes" + +# 2. Get extension metadata +curl "http://localhost:8082/api/v1/extensions/taskserv/kubernetes" + +# 3. List available versions +curl "http://localhost:8082/api/v1/extensions/taskserv/kubernetes/versions" + +# 4. Download specific version +curl -OJ "http://localhost:8082/api/v1/extensions/taskserv/kubernetes/1.28.0" + +# 5. Verify checksum (if provided) +sha256sum kubernetes_taskserv.tar.gz +``` + +### Pagination + +```bash +# Get first page +curl "http://localhost:8082/api/v1/extensions?limit=10&offset=0" + +# Get second page +curl "http://localhost:8082/api/v1/extensions?limit=10&offset=10" + +# Get third page +curl "http://localhost:8082/api/v1/extensions?limit=10&offset=20" +``` + +### Filtering + +```bash +# Only providers from Gitea +curl "http://localhost:8082/api/v1/extensions?type=provider&source=gitea" + +# Only taskservs from OCI +curl "http://localhost:8082/api/v1/extensions?type=taskserv&source=oci" + +# All clusters +curl "http://localhost:8082/api/v1/extensions?type=cluster" +``` + +--- + +## Support + +For issues and questions, see the main README or project documentation. \ No newline at end of file diff --git a/crates/extension-registry/README.md b/crates/extension-registry/README.md index 465e24f..db7022b 100644 --- a/crates/extension-registry/README.md +++ b/crates/extension-registry/README.md @@ -1 +1,635 @@ -# Extension Registry Service\n\nA high-performance Rust microservice that provides a unified REST API for extension discovery, versioning,\nand download from multiple sources (Gitea releases and OCI registries).\n\n## Features\n\n- **Multi-Backend Support**: Fetch extensions from Gitea releases and OCI registries\n- **Unified REST API**: Single API for all extension operations\n- **Smart Caching**: LRU cache with TTL to reduce backend API calls\n- **Prometheus Metrics**: Built-in metrics for monitoring\n- **Health Monitoring**: Health checks for all backends\n- **Type-Safe**: Strong typing for extension metadata\n- **Async/Await**: High-performance async operations with Tokio\n- **Docker Support**: Production-ready containerization\n\n## Architecture\n\n```{$detected_lang}\n┌─────────────────────────────────────────────────\n────────────┐\n│ Extension Registry API │\n│ (axum) │\n├─────────────────────────────────────────────────\n────────────┤\n│ │\n│ ┌────────────────┐ ┌────────────────┐ \n┌──────────────┐ │\n│ │ Gitea Client │ │ OCI Client │ │ LRU Cache │ │\n│ │ (reqwest) │ │ (reqwest) │ │ (parking) │ │\n│ └────────────────┘ └────────────────┘ \n└──────────────┘ │\n│ │ │ │ │\n└─────────┼────────────────────┼──────────────────\n──┼─────────┘\n │ │ │\n ▼ ▼ ▼\n ┌──────────┐ ┌──────────┐ ┌──────────┐\n │ Gitea │ │ OCI │ │ Memory │\n │ Releases │ │ Registry │ │ │\n └──────────┘ └──────────┘ └──────────┘\n```\n\n## Installation\n\n### Building from Source\n\n```{$detected_lang}\ncd provisioning/platform/extension-registry\ncargo build --release\n```\n\n### Docker Build\n\n```{$detected_lang}\ndocker build -t extension-registry:latest .\n```\n\n### Running with Cargo\n\n```{$detected_lang}\ncargo run -- --config config.toml --port 8082\n```\n\n### Running with Docker\n\n```{$detected_lang}\ndocker run -d \\n -p 8082:8082 \\n -v $(pwd)/config.toml:/app/config.toml:ro \\n -v $(pwd)/tokens:/app/tokens:ro \\n extension-registry:latest\n```\n\n## Configuration\n\nCreate a `config.toml` file (see `config.example.toml`):\n\n```{$detected_lang}\n[server]\nhost = "0.0.0.0"\nport = 8082\nworkers = 4\nenable_cors = true\nenable_compression = true\n\n# Gitea backend (optional)\n[gitea]\nurl = "https://gitea.example.com"\norganization = "provisioning-extensions"\ntoken_path = "/path/to/gitea-token.txt"\ntimeout_seconds = 30\nverify_ssl = true\n\n# OCI registry backend (optional)\n[oci]\nregistry = "registry.example.com"\nnamespace = "provisioning"\nauth_token_path = "/path/to/oci-token.txt"\ntimeout_seconds = 30\nverify_ssl = true\n\n# Cache configuration\n[cache]\ncapacity = 1000\nttl_seconds = 300\nenable_metadata_cache = true\nenable_list_cache = true\n```\n\n**Note**: At least one backend (Gitea or OCI) must be configured.\n\n## API Endpoints\n\n### Extension Operations\n\n#### List Extensions\n\n```{$detected_lang}\nGET /api/v1/extensions\n```\n\nQuery parameters:\n\n- `type` (optional): Filter by extension type (`provider`, `taskserv`, `cluster`)\n- `source` (optional): Filter by source (`gitea`, `oci`)\n- `limit` (optional): Maximum results (default: 100)\n- `offset` (optional): Pagination offset (default: 0)\n\nExample:\n\n```{$detected_lang}\ncurl http://localhost:8082/api/v1/extensions?type=provider&limit=10\n```\n\nResponse:\n\n```{$detected_lang}\n[\n {\n "name": "aws",\n "type": "provider",\n "version": "1.2.0",\n "description": "AWS provider for provisioning",\n "author": "provisioning-team",\n "repository": "https://gitea.example.com/org/aws_prov",\n "source": "gitea",\n "published_at": "2025-10-06T12:00:00Z",\n "download_url": "https://gitea.example.com/org/aws_prov/releases/download/1.2.0/aws_prov.tar.gz",\n "size": 1024000\n }\n]\n```\n\n#### Get Extension\n\n```{$detected_lang}\nGET /api/v1/extensions/{type}/{name}\n```\n\nExample:\n\n```{$detected_lang}\ncurl http://localhost:8082/api/v1/extensions/provider/aws\n```\n\n#### List Versions\n\n```{$detected_lang}\nGET /api/v1/extensions/{type}/{name}/versions\n```\n\nExample:\n\n```{$detected_lang}\ncurl http://localhost:8082/api/v1/extensions/provider/aws/versions\n```\n\nResponse:\n\n```{$detected_lang}\n[\n {\n "version": "1.2.0",\n "published_at": "2025-10-06T12:00:00Z",\n "download_url": "https://gitea.example.com/org/aws_prov/releases/download/1.2.0/aws_prov.tar.gz",\n "size": 1024000\n },\n {\n "version": "1.1.0",\n "published_at": "2025-09-15T10:30:00Z",\n "download_url": "https://gitea.example.com/org/aws_prov/releases/download/1.1.0/aws_prov.tar.gz",\n "size": 980000\n }\n]\n```\n\n#### Download Extension\n\n```{$detected_lang}\nGET /api/v1/extensions/{type}/{name}/{version}\n```\n\nExample:\n\n```{$detected_lang}\ncurl -O http://localhost:8082/api/v1/extensions/provider/aws/1.2.0\n```\n\nReturns binary data with `Content-Type: application/octet-stream`.\n\n#### Search Extensions\n\n```{$detected_lang}\nGET /api/v1/extensions/search?q={query}\n```\n\nQuery parameters:\n\n- `q` (required): Search query\n- `type` (optional): Filter by extension type\n- `limit` (optional): Maximum results (default: 50)\n\nExample:\n\n```{$detected_lang}\ncurl http://localhost:8082/api/v1/extensions/search?q=kubernetes&type=taskserv\n```\n\n### System Endpoints\n\n#### Health Check\n\n```{$detected_lang}\nGET /api/v1/health\n```\n\nExample:\n\n```{$detected_lang}\ncurl http://localhost:8082/api/v1/health\n```\n\nResponse:\n\n```{$detected_lang}\n{\n "status": "healthy",\n "version": "0.1.0",\n "uptime": 3600,\n "backends": {\n "gitea": {\n "enabled": true,\n "healthy": true\n },\n "oci": {\n "enabled": true,\n "healthy": true\n }\n }\n}\n```\n\n#### Metrics\n\n```{$detected_lang}\nGET /api/v1/metrics\n```\n\nReturns Prometheus-formatted metrics:\n\n```{$detected_lang}\n# HELP http_requests_total Total HTTP requests\n# TYPE http_requests_total counter\nhttp_requests_total 1234\n\n# HELP cache_hits_total Total cache hits\n# TYPE cache_hits_total counter\ncache_hits_total 567\n\n# HELP cache_misses_total Total cache misses\n# TYPE cache_misses_total counter\ncache_misses_total 123\n```\n\n#### Cache Statistics\n\n```{$detected_lang}\nGET /api/v1/cache/stats\n```\n\nResponse:\n\n```{$detected_lang}\n{\n "list_entries": 45,\n "metadata_entries": 120,\n "version_entries": 80,\n "total_entries": 245\n}\n```\n\n## Extension Naming Conventions\n\n### Gitea Repositories\n\nExtensions in Gitea follow specific naming patterns:\n\n- **Providers**: `{name}_prov` (e.g., `aws_prov`, `upcloud_prov`)\n- **Task Services**: `{name}_taskserv` (e.g., `kubernetes_taskserv`, `postgres_taskserv`)\n- **Clusters**: `{name}_cluster` (e.g., `buildkit_cluster`, `ci_cluster`)\n\n### OCI Artifacts\n\nExtensions in OCI registries follow these patterns:\n\n- **Providers**: `{namespace}/{name}-provider` (e.g., `provisioning/aws-provider`)\n- **Task Services**: `{namespace}/{name}-taskserv` (e.g., `provisioning/kubernetes-taskserv`)\n- **Clusters**: `{namespace}/{name}-cluster` (e.g., `provisioning/buildkit-cluster`)\n\n## Caching Strategy\n\nThe service implements a multi-level LRU cache with TTL:\n\n1. **List Cache**: Caches extension lists (filtered by type/source)\n2. **Metadata Cache**: Caches individual extension metadata\n3. **Version Cache**: Caches version lists per extension\n\nCache behavior:\n\n- **Capacity**: Configurable (default: 1000 entries)\n- **TTL**: Configurable (default: 5 minutes)\n- **Eviction**: LRU (Least Recently Used)\n- **Invalidation**: Automatic on TTL expiration\n\nCache keys:\n\n- List: `list:{type}:{source}`\n- Metadata: `{type}/{name}`\n- Versions: `{type}/{name}/versions`\n\n## Error Handling\n\nThe API uses standard HTTP status codes:\n\n- `200 OK`: Successful operation\n- `400 Bad Request`: Invalid input (e.g., invalid extension type)\n- `401 Unauthorized`: Authentication failed\n- `404 Not Found`: Extension not found\n- `429 Too Many Requests`: Rate limit exceeded\n- `500 Internal Server Error`: Server error\n\nError response format:\n\n```{$detected_lang}\n{\n "error": "not_found",\n "message": "Extension provider/nonexistent not found"\n}\n```\n\n## Metrics and Monitoring\n\n### Prometheus Metrics\n\nAvailable metrics:\n\n- `http_requests_total`: Total HTTP requests\n- `http_request_duration_seconds`: Request duration histogram\n- `cache_hits_total`: Total cache hits\n- `cache_misses_total`: Total cache misses\n- `extensions_total`: Total extensions served\n\n### Health Checks\n\nThe health endpoint checks:\n\n- Service uptime\n- Gitea backend connectivity\n- OCI backend connectivity\n- Overall service status\n\n## Development\n\n### Project Structure\n\n```{$detected_lang}\nextension-registry/\n├── Cargo.toml # Rust dependencies\n├── src/\n│ ├── main.rs # Entry point\n│ ├── lib.rs # Library exports\n│ ├── config.rs # Configuration management\n│ ├── error.rs # Error types\n│ ├── api/\n│ │ ├── handlers.rs # HTTP handlers\n│ │ └── routes.rs # Route definitions\n│ ├── gitea/\n│ │ ├── client.rs # Gitea API client\n│ │ └── models.rs # Gitea data models\n│ ├── oci/\n│ │ ├── client.rs # OCI registry client\n│ │ └── models.rs # OCI data models\n│ ├── cache/\n│ │ └── lru_cache.rs # LRU caching\n│ └── models/\n│ └── extension.rs # Extension models\n├── tests/\n│ └── integration_test.rs # Integration tests\n├── Dockerfile # Docker build\n└── README.md # This file\n```\n\n### Running Tests\n\n```{$detected_lang}\n# Run all tests\ncargo test\n\n# Run with output\ncargo test -- --nocapture\n\n# Run specific test\ncargo test test_health_check\n```\n\n### Code Quality\n\n```{$detected_lang}\n# Format code\ncargo fmt\n\n# Run clippy\ncargo clippy\n\n# Check for security vulnerabilities\ncargo audit\n```\n\n## Deployment\n\n### Systemd Service\n\nCreate `/etc/systemd/system/extension-registry.service`:\n\n```{$detected_lang}\n[Unit]\nDescription=Extension Registry Service\nAfter=network.target\n\n[Service]\nType=simple\nUser=registry\nWorkingDirectory=/opt/extension-registry\nExecStart=/usr/local/bin/extension-registry --config /etc/extension-registry/config.toml\nRestart=on-failure\nRestartSec=5s\n\n[Install]\nWantedBy=multi-user.target\n```\n\nEnable and start:\n\n```{$detected_lang}\nsudo systemctl enable extension-registry\nsudo systemctl start extension-registry\nsudo systemctl status extension-registry\n```\n\n### Docker Compose\n\n```{$detected_lang}\nversion: '3.8'\n\nservices:\n extension-registry:\n image: extension-registry:latest\n ports:\n - "8082:8082"\n volumes:\n - ./config.toml:/app/config.toml:ro\n - ./tokens:/app/tokens:ro\n restart: unless-stopped\n healthcheck:\n test: ["CMD", "curl", "-f", "http://localhost:8082/api/v1/health"]\n interval: 30s\n timeout: 3s\n retries: 3\n start_period: 5s\n```\n\n### Kubernetes Deployment\n\n```{$detected_lang}\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: extension-registry\nspec:\n replicas: 3\n selector:\n matchLabels:\n app: extension-registry\n template:\n metadata:\n labels:\n app: extension-registry\n spec:\n containers:\n - name: extension-registry\n image: extension-registry:latest\n ports:\n - containerPort: 8082\n volumeMounts:\n - name: config\n mountPath: /app/config.toml\n subPath: config.toml\n - name: tokens\n mountPath: /app/tokens\n livenessProbe:\n httpGet:\n path: /api/v1/health\n port: 8082\n initialDelaySeconds: 5\n periodSeconds: 10\n readinessProbe:\n httpGet:\n path: /api/v1/health\n port: 8082\n initialDelaySeconds: 5\n periodSeconds: 5\n volumes:\n - name: config\n configMap:\n name: extension-registry-config\n - name: tokens\n secret:\n secretName: extension-registry-tokens\n---\napiVersion: v1\nkind: Service\nmetadata:\n name: extension-registry\nspec:\n selector:\n app: extension-registry\n ports:\n - port: 8082\n targetPort: 8082\n type: ClusterIP\n```\n\n## Security\n\n### Authentication\n\n- Gitea: Token-based authentication via `token_path`\n- OCI: Optional token authentication via `auth_token_path`\n\n### Best Practices\n\n1. **Store tokens securely**: Use file permissions (600) for token files\n2. **Enable SSL verification**: Set `verify_ssl = true` in production\n3. **Use HTTPS**: Always use HTTPS for Gitea and OCI registries\n4. **Limit CORS**: Configure CORS appropriately for production\n5. **Rate limiting**: Consider adding rate limiting for public APIs\n6. **Network isolation**: Run service in isolated network segments\n\n## Performance\n\n### Benchmarks\n\nTypical performance characteristics:\n\n- **Cached requests**: <5ms response time\n- **Uncached requests**: 50-200ms (depends on backend latency)\n- **Cache hit ratio**: ~85-95% in typical workloads\n- **Throughput**: 1000+ req/s on modern hardware\n\n### Optimization Tips\n\n1. **Increase cache capacity**: For large extension catalogs\n2. **Tune TTL**: Balance freshness vs. performance\n3. **Use multiple workers**: Scale with CPU cores\n4. **Enable compression**: Reduce bandwidth usage\n5. **Connection pooling**: Reuse HTTP connections to backends\n\n## Troubleshooting\n\n### Common Issues\n\n#### Service won't start\n\n- Check configuration file syntax\n- Verify token files exist and are readable\n- Check network connectivity to backends\n\n#### Extensions not found\n\n- Verify backend configuration (URL, organization, namespace)\n- Check backend connectivity with health endpoint\n- Review logs for authentication errors\n\n#### Slow responses\n\n- Check backend latency\n- Increase cache capacity or TTL\n- Review Prometheus metrics for bottlenecks\n\n### Logging\n\nEnable debug logging:\n\n```{$detected_lang}\nextension-registry --log-level debug\n```\n\nEnable JSON logging for structured logs:\n\n```{$detected_lang}\nextension-registry --json-log\n```\n\n## License\n\nPart of the Provisioning Project.\n\n## Contributing\n\nSee main project documentation for contribution guidelines.\n\n## Support\n\nFor issues and questions, please refer to the main provisioning project documentation. +# Extension Registry Service + +A high-performance Rust microservice that provides a unified REST API for extension discovery, versioning, +and download from multiple sources (Gitea releases and OCI registries). + +## Features + +- **Multi-Backend Support**: Fetch extensions from Gitea releases and OCI registries +- **Unified REST API**: Single API for all extension operations +- **Smart Caching**: LRU cache with TTL to reduce backend API calls +- **Prometheus Metrics**: Built-in metrics for monitoring +- **Health Monitoring**: Health checks for all backends +- **Type-Safe**: Strong typing for extension metadata +- **Async/Await**: High-performance async operations with Tokio +- **Docker Support**: Production-ready containerization + +## Architecture + +```bash +┌───────────────────────────────────────────────── +────────────┐ +│ Extension Registry API │ +│ (axum) │ +├───────────────────────────────────────────────── +────────────┤ +│ │ +│ ┌────────────────┐ ┌────────────────┐ +┌──────────────┐ │ +│ │ Gitea Client │ │ OCI Client │ │ LRU Cache │ │ +│ │ (reqwest) │ │ (reqwest) │ │ (parking) │ │ +│ └────────────────┘ └────────────────┘ +└──────────────┘ │ +│ │ │ │ │ +└─────────┼────────────────────┼────────────────── +──┼─────────┘ + │ │ │ + ▼ ▼ ▼ + ┌──────────┐ ┌──────────┐ ┌──────────┐ + │ Gitea │ │ OCI │ │ Memory │ + │ Releases │ │ Registry │ │ │ + └──────────┘ └──────────┘ └──────────┘ +``` + +## Installation + +### Building from Source + +```bash +cd provisioning/platform/extension-registry +cargo build --release +``` + +### Docker Build + +```bash +docker build -t extension-registry:latest . +``` + +### Running with Cargo + +```rust +cargo run -- --config config.toml --port 8082 +``` + +### Running with Docker + +```bash +docker run -d + -p 8082:8082 + -v $(pwd)/config.toml:/app/config.toml:ro + -v $(pwd)/tokens:/app/tokens:ro + extension-registry:latest +``` + +## Configuration + +Create a `config.toml` file (see `config.example.toml`): + +```toml +[server] +host = "0.0.0.0" +port = 8082 +workers = 4 +enable_cors = true +enable_compression = true + +# Gitea backend (optional) +[gitea] +url = "https://gitea.example.com" +organization = "provisioning-extensions" +token_path = "/path/to/gitea-token.txt" +timeout_seconds = 30 +verify_ssl = true + +# OCI registry backend (optional) +[oci] +registry = "registry.example.com" +namespace = "provisioning" +auth_token_path = "/path/to/oci-token.txt" +timeout_seconds = 30 +verify_ssl = true + +# Cache configuration +[cache] +capacity = 1000 +ttl_seconds = 300 +enable_metadata_cache = true +enable_list_cache = true +``` + +**Note**: At least one backend (Gitea or OCI) must be configured. + +## API Endpoints + +### Extension Operations + +#### List Extensions + +```bash +GET /api/v1/extensions +``` + +Query parameters: + +- `type` (optional): Filter by extension type (`provider`, `taskserv`, `cluster`) +- `source` (optional): Filter by source (`gitea`, `oci`) +- `limit` (optional): Maximum results (default: 100) +- `offset` (optional): Pagination offset (default: 0) + +Example: + +```bash +curl http://localhost:8082/api/v1/extensions?type=provider&limit=10 +``` + +Response: + +```bash +[ + { + "name": "aws", + "type": "provider", + "version": "1.2.0", + "description": "AWS provider for provisioning", + "author": "provisioning-team", + "repository": "https://gitea.example.com/org/aws_prov", + "source": "gitea", + "published_at": "2025-10-06T12:00:00Z", + "download_url": "https://gitea.example.com/org/aws_prov/releases/download/1.2.0/aws_prov.tar.gz", + "size": 1024000 + } +] +``` + +#### Get Extension + +```bash +GET /api/v1/extensions/{type}/{name} +``` + +Example: + +```bash +curl http://localhost:8082/api/v1/extensions/provider/aws +``` + +#### List Versions + +```bash +GET /api/v1/extensions/{type}/{name}/versions +``` + +Example: + +```bash +curl http://localhost:8082/api/v1/extensions/provider/aws/versions +``` + +Response: + +```bash +[ + { + "version": "1.2.0", + "published_at": "2025-10-06T12:00:00Z", + "download_url": "https://gitea.example.com/org/aws_prov/releases/download/1.2.0/aws_prov.tar.gz", + "size": 1024000 + }, + { + "version": "1.1.0", + "published_at": "2025-09-15T10:30:00Z", + "download_url": "https://gitea.example.com/org/aws_prov/releases/download/1.1.0/aws_prov.tar.gz", + "size": 980000 + } +] +``` + +#### Download Extension + +```bash +GET /api/v1/extensions/{type}/{name}/{version} +``` + +Example: + +```bash +curl -O http://localhost:8082/api/v1/extensions/provider/aws/1.2.0 +``` + +Returns binary data with `Content-Type: application/octet-stream`. + +#### Search Extensions + +```bash +GET /api/v1/extensions/search?q={query} +``` + +Query parameters: + +- `q` (required): Search query +- `type` (optional): Filter by extension type +- `limit` (optional): Maximum results (default: 50) + +Example: + +```bash +curl http://localhost:8082/api/v1/extensions/search?q=kubernetes&type=taskserv +``` + +### System Endpoints + +#### Health Check + +```bash +GET /api/v1/health +``` + +Example: + +```bash +curl http://localhost:8082/api/v1/health +``` + +Response: + +```json +{ + "status": "healthy", + "version": "0.1.0", + "uptime": 3600, + "backends": { + "gitea": { + "enabled": true, + "healthy": true + }, + "oci": { + "enabled": true, + "healthy": true + } + } +} +``` + +#### Metrics + +```bash +GET /api/v1/metrics +``` + +Returns Prometheus-formatted metrics: + +```bash +# HELP http_requests_total Total HTTP requests +# TYPE http_requests_total counter +http_requests_total 1234 + +# HELP cache_hits_total Total cache hits +# TYPE cache_hits_total counter +cache_hits_total 567 + +# HELP cache_misses_total Total cache misses +# TYPE cache_misses_total counter +cache_misses_total 123 +``` + +#### Cache Statistics + +```bash +GET /api/v1/cache/stats +``` + +Response: + +```json +{ + "list_entries": 45, + "metadata_entries": 120, + "version_entries": 80, + "total_entries": 245 +} +``` + +## Extension Naming Conventions + +### Gitea Repositories + +Extensions in Gitea follow specific naming patterns: + +- **Providers**: `{name}_prov` (e.g., `aws_prov`, `upcloud_prov`) +- **Task Services**: `{name}_taskserv` (e.g., `kubernetes_taskserv`, `postgres_taskserv`) +- **Clusters**: `{name}_cluster` (e.g., `buildkit_cluster`, `ci_cluster`) + +### OCI Artifacts + +Extensions in OCI registries follow these patterns: + +- **Providers**: `{namespace}/{name}-provider` (e.g., `provisioning/aws-provider`) +- **Task Services**: `{namespace}/{name}-taskserv` (e.g., `provisioning/kubernetes-taskserv`) +- **Clusters**: `{namespace}/{name}-cluster` (e.g., `provisioning/buildkit-cluster`) + +## Caching Strategy + +The service implements a multi-level LRU cache with TTL: + +1. **List Cache**: Caches extension lists (filtered by type/source) +2. **Metadata Cache**: Caches individual extension metadata +3. **Version Cache**: Caches version lists per extension + +Cache behavior: + +- **Capacity**: Configurable (default: 1000 entries) +- **TTL**: Configurable (default: 5 minutes) +- **Eviction**: LRU (Least Recently Used) +- **Invalidation**: Automatic on TTL expiration + +Cache keys: + +- List: `list:{type}:{source}` +- Metadata: `{type}/{name}` +- Versions: `{type}/{name}/versions` + +## Error Handling + +The API uses standard HTTP status codes: + +- `200 OK`: Successful operation +- `400 Bad Request`: Invalid input (e.g., invalid extension type) +- `401 Unauthorized`: Authentication failed +- `404 Not Found`: Extension not found +- `429 Too Many Requests`: Rate limit exceeded +- `500 Internal Server Error`: Server error + +Error response format: + +```json +{ + "error": "not_found", + "message": "Extension provider/nonexistent not found" +} +``` + +## Metrics and Monitoring + +### Prometheus Metrics + +Available metrics: + +- `http_requests_total`: Total HTTP requests +- `http_request_duration_seconds`: Request duration histogram +- `cache_hits_total`: Total cache hits +- `cache_misses_total`: Total cache misses +- `extensions_total`: Total extensions served + +### Health Checks + +The health endpoint checks: + +- Service uptime +- Gitea backend connectivity +- OCI backend connectivity +- Overall service status + +## Development + +### Project Structure + +```bash +extension-registry/ +├── Cargo.toml # Rust dependencies +├── src/ +│ ├── main.rs # Entry point +│ ├── lib.rs # Library exports +│ ├── config.rs # Configuration management +│ ├── error.rs # Error types +│ ├── api/ +│ │ ├── handlers.rs # HTTP handlers +│ │ └── routes.rs # Route definitions +│ ├── gitea/ +│ │ ├── client.rs # Gitea API client +│ │ └── models.rs # Gitea data models +│ ├── oci/ +│ │ ├── client.rs # OCI registry client +│ │ └── models.rs # OCI data models +│ ├── cache/ +│ │ └── lru_cache.rs # LRU caching +│ └── models/ +│ └── extension.rs # Extension models +├── tests/ +│ └── integration_test.rs # Integration tests +├── Dockerfile # Docker build +└── README.md # This file +``` + +### Running Tests + +```bash +# Run all tests +cargo test + +# Run with output +cargo test -- --nocapture + +# Run specific test +cargo test test_health_check +``` + +### Code Quality + +```bash +# Format code +cargo fmt + +# Run clippy +cargo clippy + +# Check for security vulnerabilities +cargo audit +``` + +## Deployment + +### Systemd Service + +Create `/etc/systemd/system/extension-registry.service`: + +```toml +[Unit] +Description=Extension Registry Service +After=network.target + +[Service] +Type=simple +User=registry +WorkingDirectory=/opt/extension-registry +ExecStart=/usr/local/bin/extension-registry --config /etc/extension-registry/config.toml +Restart=on-failure +RestartSec=5s + +[Install] +WantedBy=multi-user.target +``` + +Enable and start: + +```bash +sudo systemctl enable extension-registry +sudo systemctl start extension-registry +sudo systemctl status extension-registry +``` + +### Docker Compose + +```bash +version: '3.8' + +services: + extension-registry: + image: extension-registry:latest + ports: + - "8082:8082" + volumes: + - ./config.toml:/app/config.toml:ro + - ./tokens:/app/tokens:ro + restart: unless-stopped + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:8082/api/v1/health"] + interval: 30s + timeout: 3s + retries: 3 + start_period: 5s +``` + +### Kubernetes Deployment + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: extension-registry +spec: + replicas: 3 + selector: + matchLabels: + app: extension-registry + template: + metadata: + labels: + app: extension-registry + spec: + containers: + - name: extension-registry + image: extension-registry:latest + ports: + - containerPort: 8082 + volumeMounts: + - name: config + mountPath: /app/config.toml + subPath: config.toml + - name: tokens + mountPath: /app/tokens + livenessProbe: + httpGet: + path: /api/v1/health + port: 8082 + initialDelaySeconds: 5 + periodSeconds: 10 + readinessProbe: + httpGet: + path: /api/v1/health + port: 8082 + initialDelaySeconds: 5 + periodSeconds: 5 + volumes: + - name: config + configMap: + name: extension-registry-config + - name: tokens + secret: + secretName: extension-registry-tokens +--- +apiVersion: v1 +kind: Service +metadata: + name: extension-registry +spec: + selector: + app: extension-registry + ports: + - port: 8082 + targetPort: 8082 + type: ClusterIP +``` + +## Security + +### Authentication + +- Gitea: Token-based authentication via `token_path` +- OCI: Optional token authentication via `auth_token_path` + +### Best Practices + +1. **Store tokens securely**: Use file permissions (600) for token files +2. **Enable SSL verification**: Set `verify_ssl = true` in production +3. **Use HTTPS**: Always use HTTPS for Gitea and OCI registries +4. **Limit CORS**: Configure CORS appropriately for production +5. **Rate limiting**: Consider adding rate limiting for public APIs +6. **Network isolation**: Run service in isolated network segments + +## Performance + +### Benchmarks + +Typical performance characteristics: + +- **Cached requests**: <5ms response time +- **Uncached requests**: 50-200ms (depends on backend latency) +- **Cache hit ratio**: ~85-95% in typical workloads +- **Throughput**: 1000+ req/s on modern hardware + +### Optimization Tips + +1. **Increase cache capacity**: For large extension catalogs +2. **Tune TTL**: Balance freshness vs. performance +3. **Use multiple workers**: Scale with CPU cores +4. **Enable compression**: Reduce bandwidth usage +5. **Connection pooling**: Reuse HTTP connections to backends + +## Troubleshooting + +### Common Issues + +#### Service won't start + +- Check configuration file syntax +- Verify token files exist and are readable +- Check network connectivity to backends + +#### Extensions not found + +- Verify backend configuration (URL, organization, namespace) +- Check backend connectivity with health endpoint +- Review logs for authentication errors + +#### Slow responses + +- Check backend latency +- Increase cache capacity or TTL +- Review Prometheus metrics for bottlenecks + +### Logging + +Enable debug logging: + +```bash +extension-registry --log-level debug +``` + +Enable JSON logging for structured logs: + +```bash +extension-registry --json-log +``` + +## License + +Part of the Provisioning Project. + +## Contributing + +See main project documentation for contribution guidelines. + +## Support + +For issues and questions, please refer to the main provisioning project documentation. \ No newline at end of file diff --git a/crates/mcp-server/README.md b/crates/mcp-server/README.md index b0e4691..197ee92 100644 --- a/crates/mcp-server/README.md +++ b/crates/mcp-server/README.md @@ -1 +1,135 @@ -# Rust MCP Server for Infrastructure Automation\n\n## Overview\n\nA **Rust-native Model Context Protocol (MCP) server** for infrastructure automation and AI-assisted DevOps operations.\nThis replaces the Python implementation, providing significant performance improvements and maintaining philosophical consistency\nwith the Rust ecosystem approach.\n\n## ✅ Project Status: **PROOF OF CONCEPT COMPLETE**\n\n### 🎯 Achieved Goals\n\n- ✅ **Feasibility Analysis**: Rust MCP server is fully viable\n- ✅ **Functional Prototype**: All core features working\n- ✅ **Performance Benchmarks**: Microsecond-level latency achieved\n- ✅ **Integration**: Successfully integrates with existing provisioning system\n\n### 🚀 Performance Results\n\n```{$detected_lang}\n🚀 Rust MCP Server Performance Analysis\n==================================================\n\n📋 Server Parsing Performance:\n • 31 chars: 0μs avg\n • 67 chars: 0μs avg \n • 65 chars: 0μs avg\n • 58 chars: 0μs avg\n\n🤖 AI Status Performance:\n • AI Status: 0μs avg (10000 iterations)\n\n💾 Memory Footprint:\n • ServerConfig size: 80 bytes\n • Config size: 272 bytes\n\n✅ Performance Summary:\n • Server parsing: Sub-millisecond latency\n • Configuration access: Microsecond latency\n • Memory efficient: Small struct footprint\n • Zero-copy string operations where possible\n```\n\n### 🏗️ Architecture\n\n```{$detected_lang}\nsrc/\n├── simple_main.rs # Lightweight MCP server entry point\n├── main.rs # Full MCP server (with SDK integration)\n├── lib.rs # Library interface\n├── config.rs # Configuration management\n├── provisioning.rs # Core provisioning engine\n├── tools.rs # AI-powered parsing tools\n├── errors.rs # Error handling\n└── performance_test.rs # Performance benchmarking\n```\n\n### 🎲 Key Features\n\n1. **AI-Powered Server Parsing**: Natural language to infrastructure config\n2. **Multi-Provider Support**: AWS, UpCloud, Local\n3. **Configuration Management**: TOML-based with environment overrides \n4. **Error Handling**: Comprehensive error types with recovery hints\n5. **Performance Monitoring**: Built-in benchmarking capabilities\n\n### 📊 Rust vs Python Comparison\n\n| Metric | Python MCP Server | Rust MCP Server | Improvement |\n| -------- | ------------------ | ----------------- | ------------- |\n| **Startup Time** | ~500ms | ~50ms | **10x faster** |\n| **Memory Usage** | ~50MB | ~5MB | **10x less** |\n| **Parsing Latency** | ~1ms | ~0.001ms | **1000x faster** |\n| **Binary Size** | Python + deps | ~15MB static | **Portable** |\n| **Type Safety** | Runtime errors | Compile-time | **Zero runtime errors** |\n\n### 🛠️ Usage\n\n```{$detected_lang}\n# Build and run\ncargo run --bin provisioning-mcp-server --release\n\n# Run with custom config\nPROVISIONING_PATH=/path/to/provisioning cargo run --bin provisioning-mcp-server -- --debug\n\n# Run tests\ncargo test\n\n# Run benchmarks \ncargo run --bin provisioning-mcp-server --release\n```\n\n### 🔧 Configuration\n\nSet via environment variables:\n\n```{$detected_lang}\nexport PROVISIONING_PATH=/path/to/provisioning\nexport PROVISIONING_AI_PROVIDER=openai\nexport OPENAI_API_KEY=your-key\nexport PROVISIONING_DEBUG=true\n```\n\n### 📈 Integration Benefits\n\n1. **Philosophical Consistency**: Rust throughout the stack\n2. **Performance**: Sub-millisecond response times\n3. **Memory Safety**: No segfaults, no memory leaks\n4. **Concurrency**: Native async/await support\n5. **Distribution**: Single static binary\n6. **Cross-compilation**: ARM64/x86_64 support\n\n### 🎪 Demo Integration\n\nThis Rust MCP server is ready to be showcased at the **Rust Meetup 2025** as proof that:\n\n> **"A Rust-first approach to infrastructure automation delivers both performance and safety without compromising functionality."**\n\n### 🚧 Next Steps\n\n1. Full MCP SDK integration (schema definitions)\n2. WebSocket/TCP transport layer\n3. Plugin system for extensibility\n4. Metrics collection and monitoring\n5. Documentation and examples\n\n### 📝 Conclusion\n\n**The Rust MCP Server successfully demonstrates that replacing Python components with Rust provides:**\n\n- ⚡ **1000x performance improvement** in parsing operations\n- 🧠 **10x memory efficiency**\n- 🔒 **Compile-time safety** guarantees\n- 🎯 **Philosophical consistency** with the ecosystem approach\n\nThis validates the **"Rust-first infrastructure automation"** approach for the meetup presentation. +# Rust MCP Server for Infrastructure Automation + +## Overview + +A **Rust-native Model Context Protocol (MCP) server** for infrastructure automation and AI-assisted DevOps operations. +This replaces the Python implementation, providing significant performance improvements and maintaining philosophical consistency +with the Rust ecosystem approach. + +## ✅ Project Status: **PROOF OF CONCEPT COMPLETE** + +### 🎯 Achieved Goals + +- ✅ **Feasibility Analysis**: Rust MCP server is fully viable +- ✅ **Functional Prototype**: All core features working +- ✅ **Performance Benchmarks**: Microsecond-level latency achieved +- ✅ **Integration**: Successfully integrates with existing provisioning system + +### 🚀 Performance Results + +```bash +🚀 Rust MCP Server Performance Analysis +================================================== + +📋 Server Parsing Performance: + • 31 chars: 0μs avg + • 67 chars: 0μs avg + • 65 chars: 0μs avg + • 58 chars: 0μs avg + +🤖 AI Status Performance: + • AI Status: 0μs avg (10000 iterations) + +💾 Memory Footprint: + • ServerConfig size: 80 bytes + • Config size: 272 bytes + +✅ Performance Summary: + • Server parsing: Sub-millisecond latency + • Configuration access: Microsecond latency + • Memory efficient: Small struct footprint + • Zero-copy string operations where possible +``` + +### 🏗️ Architecture + +```bash +src/ +├── simple_main.rs # Lightweight MCP server entry point +├── main.rs # Full MCP server (with SDK integration) +├── lib.rs # Library interface +├── config.rs # Configuration management +├── provisioning.rs # Core provisioning engine +├── tools.rs # AI-powered parsing tools +├── errors.rs # Error handling +└── performance_test.rs # Performance benchmarking +``` + +### 🎲 Key Features + +1. **AI-Powered Server Parsing**: Natural language to infrastructure config +2. **Multi-Provider Support**: AWS, UpCloud, Local +3. **Configuration Management**: TOML-based with environment overrides +4. **Error Handling**: Comprehensive error types with recovery hints +5. **Performance Monitoring**: Built-in benchmarking capabilities + +### 📊 Rust vs Python Comparison + +| Metric | Python MCP Server | Rust MCP Server | Improvement | +| -------- | ------------------ | ----------------- | ------------- | +| **Startup Time** | ~500ms | ~50ms | **10x faster** | +| **Memory Usage** | ~50MB | ~5MB | **10x less** | +| **Parsing Latency** | ~1ms | ~0.001ms | **1000x faster** | +| **Binary Size** | Python + deps | ~15MB static | **Portable** | +| **Type Safety** | Runtime errors | Compile-time | **Zero runtime errors** | + +### 🛠️ Usage + +```bash +# Build and run +cargo run --bin provisioning-mcp-server --release + +# Run with custom config +PROVISIONING_PATH=/path/to/provisioning cargo run --bin provisioning-mcp-server -- --debug + +# Run tests +cargo test + +# Run benchmarks +cargo run --bin provisioning-mcp-server --release +``` + +### 🔧 Configuration + +Set via environment variables: + +```javascript +export PROVISIONING_PATH=/path/to/provisioning +export PROVISIONING_AI_PROVIDER=openai +export OPENAI_API_KEY=your-key +export PROVISIONING_DEBUG=true +``` + +### 📈 Integration Benefits + +1. **Philosophical Consistency**: Rust throughout the stack +2. **Performance**: Sub-millisecond response times +3. **Memory Safety**: No segfaults, no memory leaks +4. **Concurrency**: Native async/await support +5. **Distribution**: Single static binary +6. **Cross-compilation**: ARM64/x86_64 support + +### 🎪 Demo Integration + +This Rust MCP server is ready to be showcased at the **Rust Meetup 2025** as proof that: + +> **"A Rust-first approach to infrastructure automation delivers both performance and safety without compromising functionality."** + +### 🚧 Next Steps + +1. Full MCP SDK integration (schema definitions) +2. WebSocket/TCP transport layer +3. Plugin system for extensibility +4. Metrics collection and monitoring +5. Documentation and examples + +### 📝 Conclusion + +**The Rust MCP Server successfully demonstrates that replacing Python components with Rust provides:** + +- ⚡ **1000x performance improvement** in parsing operations +- 🧠 **10x memory efficiency** +- 🔒 **Compile-time safety** guarantees +- 🎯 **Philosophical consistency** with the ecosystem approach + +This validates the **"Rust-first infrastructure automation"** approach for the meetup presentation. \ No newline at end of file diff --git a/crates/orchestrator/README.md b/crates/orchestrator/README.md index d094292..e5b1f9f 100644 --- a/crates/orchestrator/README.md +++ b/crates/orchestrator/README.md @@ -1 +1,515 @@ -# Provisioning Orchestrator\n\nA Rust-based orchestrator service that coordinates infrastructure provisioning workflows with pluggable storage backends and comprehensive migration \ntools.\n\n## Architecture\n\nThe orchestrator implements a hybrid multi-storage approach:\n\n- **Rust Orchestrator**: Handles coordination, queuing, and parallel execution\n- **Nushell Scripts**: Execute the actual provisioning logic\n- **Pluggable Storage**: Multiple storage backends with seamless migration\n- **REST API**: HTTP interface for workflow submission and monitoring\n\n## Features\n\n- **Multi-Storage Backends**: Filesystem, SurrealDB Embedded, and SurrealDB Server options\n- **Task Queue**: Priority-based task scheduling with retry logic\n- **Seamless Migration**: Move data between storage backends with zero downtime\n- **Feature Flags**: Compile-time backend selection for minimal dependencies\n- **Parallel Execution**: Multiple tasks can run concurrently\n- **Status Tracking**: Real-time task status and progress monitoring\n- **Advanced Features**: Authentication, audit logging, and metrics (SurrealDB)\n- **Nushell Integration**: Seamless execution of existing provisioning scripts\n- **RESTful API**: HTTP endpoints for workflow management\n- **Test Environment Service**: Automated containerized testing for taskservs, servers, and clusters\n- **Multi-Node Support**: Test complex topologies including Kubernetes and etcd clusters\n- **Docker Integration**: Automated container lifecycle management via Docker API\n\n## Quick Start\n\n### Build and Run\n\n**Default Build (Filesystem Only)**:\n\n```{$detected_lang}\ncd src/orchestrator\ncargo build --release\ncargo run -- --port 8080 --data-dir ./data\n```\n\n**With SurrealDB Support**:\n\n```{$detected_lang}\ncd src/orchestrator\ncargo build --release --features surrealdb\n\n# Run with SurrealDB embedded\ncargo run --features surrealdb -- --storage-type surrealdb-embedded --data-dir ./data\n\n# Run with SurrealDB server\ncargo run --features surrealdb -- --storage-type surrealdb-server \\n --surrealdb-url ws://localhost:8000 \\n --surrealdb-username admin --surrealdb-password secret\n```\n\n### Submit a Server Creation Workflow\n\n```{$detected_lang}\ncurl -X POST http://localhost:8080/workflows/servers/create \\n -H "Content-Type: application/json" \\n -d '{\n "infra": "production",\n "settings": "./settings.yaml",\n "servers": ["web-01", "web-02"],\n "check_mode": false,\n "wait": true\n }'\n```\n\n### Check Task Status\n\n```{$detected_lang}\ncurl http://localhost:8080/tasks/{task_id}\n```\n\n### List All Tasks\n\n```{$detected_lang}\ncurl http://localhost:8080/tasks\n```\n\n## API Endpoints\n\n### Health Check\n\n- `GET /health` - Service health status\n\n### Task Management\n\n- `GET /tasks` - List all tasks\n- `GET /tasks/{id}` - Get specific task status\n\n### Workflows\n\n- `POST /workflows/servers/create` - Submit server creation workflow\n- `POST /workflows/taskserv/create` - Submit taskserv creation workflow\n- `POST /workflows/cluster/create` - Submit cluster creation workflow\n\n### Test Environments\n\n- `POST /test/environments/create` - Create test environment\n- `GET /test/environments` - List all test environments\n- `GET /test/environments/{id}` - Get environment details\n- `POST /test/environments/{id}/run` - Run tests in environment\n- `DELETE /test/environments/{id}` - Cleanup test environment\n- `GET /test/environments/{id}/logs` - Get environment logs\n\n## Test Environment Service\n\nThe orchestrator includes a comprehensive test environment service for automated containerized testing\nof taskservs, complete servers, and multi-node clusters.\n\n### Overview\n\nThe Test Environment Service enables:\n\n- **Single Taskserv Testing**: Test individual taskservs in isolated containers\n- **Server Simulation**: Test complete server configurations with multiple taskservs\n- **Cluster Topologies**: Test multi-node clusters (Kubernetes, etcd, etc.)\n- **Automated Container Management**: No manual Docker management required\n- **Network Isolation**: Each test environment gets dedicated networks\n- **Resource Limits**: Configure CPU, memory, and disk limits per container\n\n### Test Environment Types\n\n#### 1. Single Taskserv\n\nTest individual taskserv in isolated container:\n\n```{$detected_lang}\ncurl -X POST http://localhost:8080/test/environments/create \\n -H "Content-Type: application/json" \\n -d '{\n "config": {\n "type": "single_taskserv",\n "taskserv": "kubernetes",\n "base_image": "ubuntu:22.04",\n "resources": {\n "cpu_millicores": 2000,\n "memory_mb": 4096\n }\n },\n "auto_start": true,\n "auto_cleanup": false\n }'\n```\n\n#### 2. Server Simulation\n\nSimulate complete server with multiple taskservs:\n\n```{$detected_lang}\ncurl -X POST http://localhost:8080/test/environments/create \\n -H "Content-Type: application/json" \\n -d '{\n "config": {\n "type": "server_simulation",\n "server_name": "web-01",\n "taskservs": ["containerd", "kubernetes", "cilium"],\n "base_image": "ubuntu:22.04"\n },\n "infra": "prod-stack",\n "auto_start": true\n }'\n```\n\n#### 3. Cluster Topology\n\nTest multi-node cluster configurations:\n\n```{$detected_lang}\ncurl -X POST http://localhost:8080/test/environments/create \\n -H "Content-Type: application/json" \\n -d '{\n "config": {\n "type": "cluster_topology",\n "cluster_type": "kubernetes",\n "topology": {\n "nodes": [\n {\n "name": "cp-01",\n "role": "controlplane",\n "taskservs": ["etcd", "kubernetes", "containerd"],\n "resources": {\n "cpu_millicores": 2000,\n "memory_mb": 4096\n }\n },\n {\n "name": "worker-01",\n "role": "worker",\n "taskservs": ["kubernetes", "containerd", "cilium"],\n "resources": {\n "cpu_millicores": 1000,\n "memory_mb": 2048\n }\n }\n ],\n "network": {\n "subnet": "172.30.0.0/16"\n }\n }\n },\n "auto_start": true\n }'\n```\n\n### Nushell CLI Integration\n\nThe test environment service is fully integrated with Nushell CLI:\n\n```{$detected_lang}\n# Quick test (create, run, cleanup)\nprovisioning test quick kubernetes\n\n# Single taskserv test\nprovisioning test env single postgres --auto-start --auto-cleanup\n\n# Server simulation\nprovisioning test env server web-01 [containerd kubernetes cilium] --auto-start\n\n# Cluster from template\nprovisioning test topology load kubernetes_3node | test env cluster kubernetes\n\n# List environments\nprovisioning test env list\n\n# Check status\nprovisioning test env status \n\n# View logs\nprovisioning test env logs \n\n# Cleanup\nprovisioning test env cleanup \n```\n\n### Topology Templates\n\nPredefined multi-node cluster topologies are available in `provisioning/config/test-topologies.toml`:\n\n- **kubernetes_3node**: 3-node HA Kubernetes cluster (1 control plane + 2 workers)\n- **kubernetes_single**: All-in-one Kubernetes node\n- **etcd_cluster**: 3-member etcd cluster\n- **containerd_test**: Standalone containerd testing\n- **postgres_redis**: Database stack testing\n\n### Prerequisites\n\n1. **Docker Running**: The orchestrator requires Docker daemon to be running\n\n ```bash\n docker ps # Should work without errors\n ```\n\n1. **Orchestrator Running**: Start the orchestrator before using test environments\n\n ```bash\n ./scripts/start-orchestrator.nu --background\n ```\n\n### Architecture\n\n```{$detected_lang}\nUser Command (CLI/API)\n ↓\nTest Orchestrator (Rust)\n ↓\nContainer Manager (bollard)\n ↓\nDocker API\n ↓\nIsolated Test Containers\n • Dedicated networks\n • Resource limits\n • Volume mounts\n • Multi-node support\n```\n\n### Key Components\n\n#### Rust Modules\n\n- `test_environment.rs` - Core types and configurations\n- `container_manager.rs` - Docker API integration (bollard)\n- `test_orchestrator.rs` - Orchestration logic\n\n#### Features\n\n- **Automated Lifecycle**: Create, start, stop, cleanup containers automatically\n- **Network Isolation**: Each environment gets isolated Docker network\n- **Resource Management**: CPU and memory limits per container\n- **Test Execution**: Run test scripts within containers\n- **Log Collection**: Capture and expose container logs\n- **Auto-Cleanup**: Optional automatic cleanup after tests\n\n### Use Cases\n\n1. **Taskserv Development**: Test new taskservs before deployment\n2. **Integration Testing**: Validate taskserv combinations\n3. **Cluster Validation**: Test multi-node cluster configurations\n4. **CI/CD Integration**: Automated testing in pipelines\n5. **Production Simulation**: Test production-like deployments safely\n\n### CI/CD Integration\n\n```{$detected_lang}\n# GitLab CI example\ntest-infrastructure:\n stage: test\n script:\n - provisioning test quick kubernetes\n - provisioning test quick postgres\n - provisioning test quick redis\n```\n\n### Documentation\n\nFor complete usage guide and examples, see:\n\n- **User Guide**: `docs/user/test-environment-guide.md`\n- **Usage Documentation**: `docs/user/test-environment-usage.md`\n- **Implementation Summary**: `provisioning/core/nulib/test_environments_summary.md`\n\n## Configuration\n\n### Core Options\n\n- `--port` - HTTP server port (default: 8080)\n- `--data-dir` - Data directory for storage (default: ./data)\n- `--storage-type` - Storage backend: filesystem, surrealdb-embedded, surrealdb-server\n- `--nu-path` - Path to Nushell executable (default: nu)\n- `--provisioning-path` - Path to provisioning script (default: ./core/nulib/provisioning)\n\n### SurrealDB Options (when `--features surrealdb` enabled)\n\n- `--surrealdb-url` - Server URL for surrealdb-server mode (e.g., ws://localhost:8000)\n- `--surrealdb-namespace` - Database namespace (default: orchestrator)\n- `--surrealdb-database` - Database name (default: tasks)\n- `--surrealdb-username` - Authentication username\n- `--surrealdb-password` - Authentication password\n\n### Storage Backend Comparison\n\n| Feature | Filesystem | SurrealDB Embedded | SurrealDB Server |\n| --------- | ------------ | ------------------- | ------------------ |\n| **Dependencies** | None | Local database | Remote server |\n| **Auth/RBAC** | Basic | Advanced | Advanced |\n| **Real-time** | No | Yes | Yes |\n| **Scalability** | Limited | Medium | High |\n| **Complexity** | Low | Medium | High |\n| **Best For** | Development | Production | Distributed |\n\n## Nushell Integration\n\nThe orchestrator includes workflow wrappers in `core/nulib/workflows/server_create.nu`:\n\n```{$detected_lang}\n# Submit workflow via Nushell\nuse workflows/server_create.nu\nserver_create_workflow "production" --settings "./settings.yaml" --wait\n\n# Check workflow status\nworkflow status $task_id\n\n# List all workflows\nworkflow list\n```\n\n## Task States\n\n- **Pending**: Queued for execution\n- **Running**: Currently executing\n- **Completed**: Finished successfully\n- **Failed**: Execution failed (will retry if under limit)\n- **Cancelled**: Manually cancelled\n\n## Storage Architecture\n\n### Multi-Backend Support\n\nThe orchestrator uses a pluggable storage architecture with three backends:\n\n#### Filesystem (Default)\n\n- **Format**: JSON files in directory structure\n- **Location**: `{data_dir}/queue.rkvs/{tasks,queue}/`\n- **Features**: Basic task persistence, priority queuing\n- **Best For**: Development, simple deployments\n\n#### SurrealDB Embedded\n\n- **Format**: Local SurrealDB database with RocksDB engine\n- **Location**: `{data_dir}/orchestrator.db`\n- **Features**: ACID transactions, advanced queries, audit logging\n- **Best For**: Production single-node deployments\n\n#### SurrealDB Server\n\n- **Format**: Remote SurrealDB server connection\n- **Connection**: WebSocket or HTTP protocol\n- **Features**: Full multi-user, real-time subscriptions, horizontal scaling\n- **Best For**: Distributed production deployments\n\n### Data Migration\n\nSeamless migration between storage backends:\n\n```{$detected_lang}\n# Interactive migration wizard\n./scripts/migrate-storage.nu --interactive\n\n# Direct migration\n./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded \\n --source-dir ./data --target-dir ./surrealdb-data\n\n# Validate migration setup\n./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-server\n```\n\n## Error Handling\n\n- Failed tasks are automatically retried up to 3 times\n- Permanent failures are marked and logged\n- Service restart recovery loads tasks from persistent storage\n- API errors return structured JSON responses\n\n## Monitoring\n\n- Structured logging with tracing\n- Task execution metrics\n- Queue depth monitoring\n- Health check endpoint\n\n## Development\n\n### Dependencies\n\n**Core Dependencies** (always included):\n\n- **axum**: HTTP server framework\n- **tokio**: Async runtime\n- **serde**: Serialization\n- **tracing**: Structured logging\n- **async-trait**: Async trait support\n- **anyhow**: Error handling\n- **bollard**: Docker API client for container management\n\n**Optional Dependencies** (feature-gated):\n\n- **surrealdb**: Multi-model database (requires `--features surrealdb`)\n - Embedded mode: RocksDB storage engine\n - Server mode: WebSocket/HTTP client\n\n### Adding New Workflows\n\n1. Create workflow definition in `src/main.rs`\n2. Add API endpoint handler\n3. Create Nushell wrapper in `core/nulib/workflows/`\n4. Update existing code to use workflow bridge functions\n\n### Testing\n\n**Unit and Integration Tests**:\n\n```{$detected_lang}\n# Test with filesystem only (default)\ncargo test\n\n# Test all storage backends\ncargo test --features surrealdb\n\n# Test specific suites\ncargo test --test storage_integration\ncargo test --test migration_tests\ncargo test --test factory_tests\n```\n\n**Performance Benchmarks**:\n\n```{$detected_lang}\n# Benchmark storage performance\ncargo bench --bench storage_benchmarks\n\n# Benchmark migration performance\ncargo bench --bench migration_benchmarks\n\n# Generate HTML reports\ncargo bench --features surrealdb\nopen target/criterion/reports/index.html\n```\n\n**Test Configuration**:\n\n```{$detected_lang}\n# Run with specific backend\nTEST_STORAGE=filesystem cargo test\nTEST_STORAGE=surrealdb-embedded cargo test --features surrealdb\n\n# Verbose testing\ncargo test -- --nocapture\n```\n\n## Migration from Deep Call Stack Issues\n\nThis orchestrator solves the Nushell deep call stack limitations by:\n\n1. Moving coordination logic to Rust\n2. Executing individual Nushell commands at top level\n3. Managing parallel execution externally\n4. Preserving all existing business logic in Nushell\n\nThe existing `on_create_servers` function can be replaced with `on_create_servers_workflow` for orchestrated execution while maintaining full \ncompatibility. +# Provisioning Orchestrator + +A Rust-based orchestrator service that coordinates infrastructure provisioning workflows with pluggable storage backends and comprehensive migration +tools. + +## Architecture + +The orchestrator implements a hybrid multi-storage approach: + +- **Rust Orchestrator**: Handles coordination, queuing, and parallel execution +- **Nushell Scripts**: Execute the actual provisioning logic +- **Pluggable Storage**: Multiple storage backends with seamless migration +- **REST API**: HTTP interface for workflow submission and monitoring + +## Features + +- **Multi-Storage Backends**: Filesystem, SurrealDB Embedded, and SurrealDB Server options +- **Task Queue**: Priority-based task scheduling with retry logic +- **Seamless Migration**: Move data between storage backends with zero downtime +- **Feature Flags**: Compile-time backend selection for minimal dependencies +- **Parallel Execution**: Multiple tasks can run concurrently +- **Status Tracking**: Real-time task status and progress monitoring +- **Advanced Features**: Authentication, audit logging, and metrics (SurrealDB) +- **Nushell Integration**: Seamless execution of existing provisioning scripts +- **RESTful API**: HTTP endpoints for workflow management +- **Test Environment Service**: Automated containerized testing for taskservs, servers, and clusters +- **Multi-Node Support**: Test complex topologies including Kubernetes and etcd clusters +- **Docker Integration**: Automated container lifecycle management via Docker API + +## Quick Start + +### Build and Run + +**Default Build (Filesystem Only)**: + +```bash +cd src/orchestrator +cargo build --release +cargo run -- --port 8080 --data-dir ./data +``` + +**With SurrealDB Support**: + +```bash +cd src/orchestrator +cargo build --release --features surrealdb + +# Run with SurrealDB embedded +cargo run --features surrealdb -- --storage-type surrealdb-embedded --data-dir ./data + +# Run with SurrealDB server +cargo run --features surrealdb -- --storage-type surrealdb-server + --surrealdb-url ws://localhost:8000 + --surrealdb-username admin --surrealdb-password secret +``` + +### Submit a Server Creation Workflow + +```bash +curl -X POST http://localhost:8080/workflows/servers/create + -H "Content-Type: application/json" + -d '{ + "infra": "production", + "settings": "./settings.yaml", + "servers": ["web-01", "web-02"], + "check_mode": false, + "wait": true + }' +``` + +### Check Task Status + +```bash +curl http://localhost:8080/tasks/{task_id} +``` + +### List All Tasks + +```bash +curl http://localhost:8080/tasks +``` + +## API Endpoints + +### Health Check + +- `GET /health` - Service health status + +### Task Management + +- `GET /tasks` - List all tasks +- `GET /tasks/{id}` - Get specific task status + +### Workflows + +- `POST /workflows/servers/create` - Submit server creation workflow +- `POST /workflows/taskserv/create` - Submit taskserv creation workflow +- `POST /workflows/cluster/create` - Submit cluster creation workflow + +### Test Environments + +- `POST /test/environments/create` - Create test environment +- `GET /test/environments` - List all test environments +- `GET /test/environments/{id}` - Get environment details +- `POST /test/environments/{id}/run` - Run tests in environment +- `DELETE /test/environments/{id}` - Cleanup test environment +- `GET /test/environments/{id}/logs` - Get environment logs + +## Test Environment Service + +The orchestrator includes a comprehensive test environment service for automated containerized testing +of taskservs, complete servers, and multi-node clusters. + +### Overview + +The Test Environment Service enables: + +- **Single Taskserv Testing**: Test individual taskservs in isolated containers +- **Server Simulation**: Test complete server configurations with multiple taskservs +- **Cluster Topologies**: Test multi-node clusters (Kubernetes, etcd, etc.) +- **Automated Container Management**: No manual Docker management required +- **Network Isolation**: Each test environment gets dedicated networks +- **Resource Limits**: Configure CPU, memory, and disk limits per container + +### Test Environment Types + +#### 1. Single Taskserv + +Test individual taskserv in isolated container: + +```bash +curl -X POST http://localhost:8080/test/environments/create + -H "Content-Type: application/json" + -d '{ + "config": { + "type": "single_taskserv", + "taskserv": "kubernetes", + "base_image": "ubuntu:22.04", + "resources": { + "cpu_millicores": 2000, + "memory_mb": 4096 + } + }, + "auto_start": true, + "auto_cleanup": false + }' +``` + +#### 2. Server Simulation + +Simulate complete server with multiple taskservs: + +```bash +curl -X POST http://localhost:8080/test/environments/create + -H "Content-Type: application/json" + -d '{ + "config": { + "type": "server_simulation", + "server_name": "web-01", + "taskservs": ["containerd", "kubernetes", "cilium"], + "base_image": "ubuntu:22.04" + }, + "infra": "prod-stack", + "auto_start": true + }' +``` + +#### 3. Cluster Topology + +Test multi-node cluster configurations: + +```toml +curl -X POST http://localhost:8080/test/environments/create + -H "Content-Type: application/json" + -d '{ + "config": { + "type": "cluster_topology", + "cluster_type": "kubernetes", + "topology": { + "nodes": [ + { + "name": "cp-01", + "role": "controlplane", + "taskservs": ["etcd", "kubernetes", "containerd"], + "resources": { + "cpu_millicores": 2000, + "memory_mb": 4096 + } + }, + { + "name": "worker-01", + "role": "worker", + "taskservs": ["kubernetes", "containerd", "cilium"], + "resources": { + "cpu_millicores": 1000, + "memory_mb": 2048 + } + } + ], + "network": { + "subnet": "172.30.0.0/16" + } + } + }, + "auto_start": true + }' +``` + +### Nushell CLI Integration + +The test environment service is fully integrated with Nushell CLI: + +```nushell +# Quick test (create, run, cleanup) +provisioning test quick kubernetes + +# Single taskserv test +provisioning test env single postgres --auto-start --auto-cleanup + +# Server simulation +provisioning test env server web-01 [containerd kubernetes cilium] --auto-start + +# Cluster from template +provisioning test topology load kubernetes_3node | test env cluster kubernetes + +# List environments +provisioning test env list + +# Check status +provisioning test env status + +# View logs +provisioning test env logs + +# Cleanup +provisioning test env cleanup +``` + +### Topology Templates + +Predefined multi-node cluster topologies are available in `provisioning/config/test-topologies.toml`: + +- **kubernetes_3node**: 3-node HA Kubernetes cluster (1 control plane + 2 workers) +- **kubernetes_single**: All-in-one Kubernetes node +- **etcd_cluster**: 3-member etcd cluster +- **containerd_test**: Standalone containerd testing +- **postgres_redis**: Database stack testing + +### Prerequisites + +1. **Docker Running**: The orchestrator requires Docker daemon to be running + + ```bash + docker ps # Should work without errors + ``` + +1. **Orchestrator Running**: Start the orchestrator before using test environments + + ```bash + ./scripts/start-orchestrator.nu --background + ``` + +### Architecture + +```bash +User Command (CLI/API) + ↓ +Test Orchestrator (Rust) + ↓ +Container Manager (bollard) + ↓ +Docker API + ↓ +Isolated Test Containers + • Dedicated networks + • Resource limits + • Volume mounts + • Multi-node support +``` + +### Key Components + +#### Rust Modules + +- `test_environment.rs` - Core types and configurations +- `container_manager.rs` - Docker API integration (bollard) +- `test_orchestrator.rs` - Orchestration logic + +#### Features + +- **Automated Lifecycle**: Create, start, stop, cleanup containers automatically +- **Network Isolation**: Each environment gets isolated Docker network +- **Resource Management**: CPU and memory limits per container +- **Test Execution**: Run test scripts within containers +- **Log Collection**: Capture and expose container logs +- **Auto-Cleanup**: Optional automatic cleanup after tests + +### Use Cases + +1. **Taskserv Development**: Test new taskservs before deployment +2. **Integration Testing**: Validate taskserv combinations +3. **Cluster Validation**: Test multi-node cluster configurations +4. **CI/CD Integration**: Automated testing in pipelines +5. **Production Simulation**: Test production-like deployments safely + +### CI/CD Integration + +```bash +# GitLab CI example +test-infrastructure: + stage: test + script: + - provisioning test quick kubernetes + - provisioning test quick postgres + - provisioning test quick redis +``` + +### Documentation + +For complete usage guide and examples, see: + +- **User Guide**: `docs/user/test-environment-guide.md` +- **Usage Documentation**: `docs/user/test-environment-usage.md` +- **Implementation Summary**: `provisioning/core/nulib/test_environments_summary.md` + +## Configuration + +### Core Options + +- `--port` - HTTP server port (default: 8080) +- `--data-dir` - Data directory for storage (default: ./data) +- `--storage-type` - Storage backend: filesystem, surrealdb-embedded, surrealdb-server +- `--nu-path` - Path to Nushell executable (default: nu) +- `--provisioning-path` - Path to provisioning script (default: ./core/nulib/provisioning) + +### SurrealDB Options (when `--features surrealdb` enabled) + +- `--surrealdb-url` - Server URL for surrealdb-server mode (e.g., ws://localhost:8000) +- `--surrealdb-namespace` - Database namespace (default: orchestrator) +- `--surrealdb-database` - Database name (default: tasks) +- `--surrealdb-username` - Authentication username +- `--surrealdb-password` - Authentication password + +### Storage Backend Comparison + +| Feature | Filesystem | SurrealDB Embedded | SurrealDB Server | +| --------- | ------------ | ------------------- | ------------------ | +| **Dependencies** | None | Local database | Remote server | +| **Auth/RBAC** | Basic | Advanced | Advanced | +| **Real-time** | No | Yes | Yes | +| **Scalability** | Limited | Medium | High | +| **Complexity** | Low | Medium | High | +| **Best For** | Development | Production | Distributed | + +## Nushell Integration + +The orchestrator includes workflow wrappers in `core/nulib/workflows/server_create.nu`: + +```nushell +# Submit workflow via Nushell +use workflows/server_create.nu +server_create_workflow "production" --settings "./settings.yaml" --wait + +# Check workflow status +workflow status $task_id + +# List all workflows +workflow list +``` + +## Task States + +- **Pending**: Queued for execution +- **Running**: Currently executing +- **Completed**: Finished successfully +- **Failed**: Execution failed (will retry if under limit) +- **Cancelled**: Manually cancelled + +## Storage Architecture + +### Multi-Backend Support + +The orchestrator uses a pluggable storage architecture with three backends: + +#### Filesystem (Default) + +- **Format**: JSON files in directory structure +- **Location**: `{data_dir}/queue.rkvs/{tasks,queue}/` +- **Features**: Basic task persistence, priority queuing +- **Best For**: Development, simple deployments + +#### SurrealDB Embedded + +- **Format**: Local SurrealDB database with RocksDB engine +- **Location**: `{data_dir}/orchestrator.db` +- **Features**: ACID transactions, advanced queries, audit logging +- **Best For**: Production single-node deployments + +#### SurrealDB Server + +- **Format**: Remote SurrealDB server connection +- **Connection**: WebSocket or HTTP protocol +- **Features**: Full multi-user, real-time subscriptions, horizontal scaling +- **Best For**: Distributed production deployments + +### Data Migration + +Seamless migration between storage backends: + +```bash +# Interactive migration wizard +./scripts/migrate-storage.nu --interactive + +# Direct migration +./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded + --source-dir ./data --target-dir ./surrealdb-data + +# Validate migration setup +./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-server +``` + +## Error Handling + +- Failed tasks are automatically retried up to 3 times +- Permanent failures are marked and logged +- Service restart recovery loads tasks from persistent storage +- API errors return structured JSON responses + +## Monitoring + +- Structured logging with tracing +- Task execution metrics +- Queue depth monitoring +- Health check endpoint + +## Development + +### Dependencies + +**Core Dependencies** (always included): + +- **axum**: HTTP server framework +- **tokio**: Async runtime +- **serde**: Serialization +- **tracing**: Structured logging +- **async-trait**: Async trait support +- **anyhow**: Error handling +- **bollard**: Docker API client for container management + +**Optional Dependencies** (feature-gated): + +- **surrealdb**: Multi-model database (requires `--features surrealdb`) + - Embedded mode: RocksDB storage engine + - Server mode: WebSocket/HTTP client + +### Adding New Workflows + +1. Create workflow definition in `src/main.rs` +2. Add API endpoint handler +3. Create Nushell wrapper in `core/nulib/workflows/` +4. Update existing code to use workflow bridge functions + +### Testing + +**Unit and Integration Tests**: + +```bash +# Test with filesystem only (default) +cargo test + +# Test all storage backends +cargo test --features surrealdb + +# Test specific suites +cargo test --test storage_integration +cargo test --test migration_tests +cargo test --test factory_tests +``` + +**Performance Benchmarks**: + +```bash +# Benchmark storage performance +cargo bench --bench storage_benchmarks + +# Benchmark migration performance +cargo bench --bench migration_benchmarks + +# Generate HTML reports +cargo bench --features surrealdb +open target/criterion/reports/index.html +``` + +**Test Configuration**: + +```toml +# Run with specific backend +TEST_STORAGE=filesystem cargo test +TEST_STORAGE=surrealdb-embedded cargo test --features surrealdb + +# Verbose testing +cargo test -- --nocapture +``` + +## Migration from Deep Call Stack Issues + +This orchestrator solves the Nushell deep call stack limitations by: + +1. Moving coordination logic to Rust +2. Executing individual Nushell commands at top level +3. Managing parallel execution externally +4. Preserving all existing business logic in Nushell + +The existing `on_create_servers` function can be replaced with `on_create_servers_workflow` for orchestrated execution while maintaining full +compatibility. \ No newline at end of file diff --git a/crates/orchestrator/docs/dns-integration.md b/crates/orchestrator/docs/dns-integration.md index 5b0a554..038f87f 100644 --- a/crates/orchestrator/docs/dns-integration.md +++ b/crates/orchestrator/docs/dns-integration.md @@ -1 +1,221 @@ -# DNS Integration Guide\n\n## Overview\n\nThe DNS integration module provides automatic DNS registration and management for provisioned servers through CoreDNS integration.\n\n## Architecture\n\n```{$detected_lang}\n┌─────────────────┐\n│ Orchestrator │\n│ (Rust) │\n└────────┬────────┘\n │\n ▼\n┌─────────────────┐\n│ DNS Manager │\n│ │\n│ - Auto-register│\n│ - TTL config │\n│ - Verification │\n└────────┬────────┘\n │\n ▼\n┌─────────────────┐\n│ CoreDNS Client │\n│ (HTTP API) │\n└────────┬────────┘\n │\n ▼\n┌─────────────────┐\n│ CoreDNS │\n│ Service │\n└─────────────────┘\n```\n\n## Features\n\n### 1. Automatic DNS Registration\n\nWhen a server is created, the orchestrator automatically registers its DNS record:\n\n```{$detected_lang}\n// In server creation workflow\nlet ip = server.get_ip_address();\nstate.dns_manager.register_server_dns(&hostname, ip).await?;\n```\n\n### 2. DNS Record Types\n\nSupports multiple DNS record types:\n\n- **A** - IPv4 address\n- **AAAA** - IPv6 address\n- **CNAME** - Canonical name\n- **TXT** - Text record\n\n### 3. DNS Verification\n\nVerify DNS resolution after registration:\n\n```{$detected_lang}\nlet verified = state.dns_manager.verify_dns_resolution("server.example.com").await?;\nif verified {\n info!("DNS resolution verified");\n}\n```\n\n### 4. Automatic Cleanup\n\nWhen a server is deleted, DNS records are automatically removed:\n\n```{$detected_lang}\nstate.dns_manager.unregister_server_dns(&hostname).await?;\n```\n\n## Configuration\n\nDNS settings in `config.defaults.toml`:\n\n```{$detected_lang}\n[orchestrator.dns]\ncoredns_url = "http://localhost:53"\nauto_register = true\nttl = 300\n```\n\n### Configuration Options\n\n- **coredns_url**: CoreDNS HTTP API endpoint\n- **auto_register**: Enable automatic DNS registration (default: true)\n- **ttl**: Default TTL for DNS records in seconds (default: 300)\n\n## API Endpoints\n\n### List DNS Records\n\n```{$detected_lang}\nGET /api/v1/dns/records\n```\n\n**Response:**\n\n```{$detected_lang}\n{\n "success": true,\n "data": [\n {\n "name": "web-01.example.com",\n "record_type": "A",\n "value": "192.168.1.10",\n "ttl": 300\n }\n ]\n}\n```\n\n## Usage Examples\n\n### Register Server DNS\n\n```{$detected_lang}\nuse std::net::IpAddr;\n\nlet ip: IpAddr = "192.168.1.10".parse()?;\ndns_manager.register_server_dns("web-01.example.com", ip).await?;\n```\n\n### Unregister Server DNS\n\n```{$detected_lang}\ndns_manager.unregister_server_dns("web-01.example.com").await?;\n```\n\n### Update DNS Record\n\n```{$detected_lang}\nlet new_ip: IpAddr = "192.168.1.20".parse()?;\ndns_manager.update_dns_record("web-01.example.com", new_ip).await?;\n```\n\n### List All Records\n\n```{$detected_lang}\nlet records = dns_manager.list_records().await?;\nfor record in records {\n println!("{} -> {} ({})", record.name, record.value, record.record_type);\n}\n```\n\n## Integration with Workflows\n\n### Server Creation Workflow\n\n1. Create server via provider API\n2. Wait for server to be ready\n3. **Register DNS record** (automatic)\n4. Verify DNS resolution\n5. Continue with next steps\n\n### Server Deletion Workflow\n\n1. Stop services on server\n2. **Unregister DNS record** (automatic)\n3. Delete server via provider API\n4. Update inventory\n\n## Error Handling\n\nThe DNS integration handles errors gracefully:\n\n- **Network errors**: Retries with exponential backoff\n- **DNS conflicts**: Reports error but continues workflow\n- **Invalid records**: Validates before sending to CoreDNS\n\n## Testing\n\nRun DNS integration tests:\n\n```{$detected_lang}\ncd provisioning/platform/orchestrator\ncargo test test_dns_integration\n```\n\n## Troubleshooting\n\n### DNS registration fails\n\n1. Check CoreDNS is running and accessible\n2. Verify `coredns_url` configuration\n3. Check network connectivity\n4. Review orchestrator logs\n\n### DNS records not resolving\n\n1. Verify record was registered (check logs)\n2. Query CoreDNS directly\n3. Check TTL settings\n4. Verify DNS resolver configuration\n\n## Best Practices\n\n1. **Use FQDN**: Always use fully-qualified domain names\n2. **Set appropriate TTL**: Lower TTL for dev, higher for prod\n3. **Enable auto-register**: Reduces manual operations\n4. **Monitor DNS health**: Check DNS resolution periodically\n\n## Security Considerations\n\n1. **Access Control**: Restrict access to CoreDNS API\n2. **Validation**: Validate hostnames and IP addresses\n3. **Audit**: Log all DNS operations\n4. **Rate Limiting**: Prevent DNS flooding\n\n## Future Enhancements\n\n- [ ] Support for SRV records\n- [ ] DNS zone management\n- [ ] DNSSEC integration\n- [ ] Multi-zone support\n- [ ] DNS caching layer +# DNS Integration Guide + +## Overview + +The DNS integration module provides automatic DNS registration and management for provisioned servers through CoreDNS integration. + +## Architecture + +```bash +┌─────────────────┐ +│ Orchestrator │ +│ (Rust) │ +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ DNS Manager │ +│ │ +│ - Auto-register│ +│ - TTL config │ +│ - Verification │ +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ CoreDNS Client │ +│ (HTTP API) │ +└────────┬────────┘ + │ + ▼ +┌─────────────────┐ +│ CoreDNS │ +│ Service │ +└─────────────────┘ +``` + +## Features + +### 1. Automatic DNS Registration + +When a server is created, the orchestrator automatically registers its DNS record: + +```bash +// In server creation workflow +let ip = server.get_ip_address(); +state.dns_manager.register_server_dns(&hostname, ip).await?; +``` + +### 2. DNS Record Types + +Supports multiple DNS record types: + +- **A** - IPv4 address +- **AAAA** - IPv6 address +- **CNAME** - Canonical name +- **TXT** - Text record + +### 3. DNS Verification + +Verify DNS resolution after registration: + +```javascript +let verified = state.dns_manager.verify_dns_resolution("server.example.com").await?; +if verified { + info!("DNS resolution verified"); +} +``` + +### 4. Automatic Cleanup + +When a server is deleted, DNS records are automatically removed: + +```bash +state.dns_manager.unregister_server_dns(&hostname).await?; +``` + +## Configuration + +DNS settings in `config.defaults.toml`: + +```toml +[orchestrator.dns] +coredns_url = "http://localhost:53" +auto_register = true +ttl = 300 +``` + +### Configuration Options + +- **coredns_url**: CoreDNS HTTP API endpoint +- **auto_register**: Enable automatic DNS registration (default: true) +- **ttl**: Default TTL for DNS records in seconds (default: 300) + +## API Endpoints + +### List DNS Records + +```bash +GET /api/v1/dns/records +``` + +**Response:** + +```json +{ + "success": true, + "data": [ + { + "name": "web-01.example.com", + "record_type": "A", + "value": "192.168.1.10", + "ttl": 300 + } + ] +} +``` + +## Usage Examples + +### Register Server DNS + +```bash +use std::net::IpAddr; + +let ip: IpAddr = "192.168.1.10".parse()?; +dns_manager.register_server_dns("web-01.example.com", ip).await?; +``` + +### Unregister Server DNS + +```bash +dns_manager.unregister_server_dns("web-01.example.com").await?; +``` + +### Update DNS Record + +```javascript +let new_ip: IpAddr = "192.168.1.20".parse()?; +dns_manager.update_dns_record("web-01.example.com", new_ip).await?; +``` + +### List All Records + +```javascript +let records = dns_manager.list_records().await?; +for record in records { + println!("{} -> {} ({})", record.name, record.value, record.record_type); +} +``` + +## Integration with Workflows + +### Server Creation Workflow + +1. Create server via provider API +2. Wait for server to be ready +3. **Register DNS record** (automatic) +4. Verify DNS resolution +5. Continue with next steps + +### Server Deletion Workflow + +1. Stop services on server +2. **Unregister DNS record** (automatic) +3. Delete server via provider API +4. Update inventory + +## Error Handling + +The DNS integration handles errors gracefully: + +- **Network errors**: Retries with exponential backoff +- **DNS conflicts**: Reports error but continues workflow +- **Invalid records**: Validates before sending to CoreDNS + +## Testing + +Run DNS integration tests: + +```bash +cd provisioning/platform/orchestrator +cargo test test_dns_integration +``` + +## Troubleshooting + +### DNS registration fails + +1. Check CoreDNS is running and accessible +2. Verify `coredns_url` configuration +3. Check network connectivity +4. Review orchestrator logs + +### DNS records not resolving + +1. Verify record was registered (check logs) +2. Query CoreDNS directly +3. Check TTL settings +4. Verify DNS resolver configuration + +## Best Practices + +1. **Use FQDN**: Always use fully-qualified domain names +2. **Set appropriate TTL**: Lower TTL for dev, higher for prod +3. **Enable auto-register**: Reduces manual operations +4. **Monitor DNS health**: Check DNS resolution periodically + +## Security Considerations + +1. **Access Control**: Restrict access to CoreDNS API +2. **Validation**: Validate hostnames and IP addresses +3. **Audit**: Log all DNS operations +4. **Rate Limiting**: Prevent DNS flooding + +## Future Enhancements + +- [ ] Support for SRV records +- [ ] DNS zone management +- [ ] DNSSEC integration +- [ ] Multi-zone support +- [ ] DNS caching layer \ No newline at end of file diff --git a/crates/orchestrator/docs/extension-loading.md b/crates/orchestrator/docs/extension-loading.md index 1a1b937..e5f3cd5 100644 --- a/crates/orchestrator/docs/extension-loading.md +++ b/crates/orchestrator/docs/extension-loading.md @@ -1 +1,376 @@ -# Extension Loading Guide\n\n## Overview\n\nThe extension loading module provides dynamic loading of providers, taskservs, and clusters through Nushell script integration.\n\n## Architecture\n\n```{$detected_lang}\n┌──────────────────┐\n│ Orchestrator │\n│ (Rust) │\n└────────┬─────────┘\n │\n ▼\n┌──────────────────┐\n│ Extension Manager│\n│ │\n│ - Caching │\n│ - Type safety │\n│ - Validation │\n└────────┬─────────┘\n │\n ▼\n┌──────────────────┐\n│Extension Loader │\n│ (Nushell Call) │\n└────────┬─────────┘\n │\n ▼\n┌──────────────────┐\n│ Nushell Scripts │\n│ (module load) │\n└──────────────────┘\n```\n\n## Extension Types\n\n### 1. Providers\n\nCloud provider implementations (AWS, UpCloud, Local):\n\n```{$detected_lang}\nlet provider = extension_manager.load_extension(\n ExtensionType::Provider,\n "aws".to_string(),\n Some("2.0.0".to_string())\n).await?;\n```\n\n### 2. Taskservs\n\nInfrastructure service definitions (Kubernetes, PostgreSQL, etc.):\n\n```{$detected_lang}\nlet taskserv = extension_manager.load_extension(\n ExtensionType::Taskserv,\n "kubernetes".to_string(),\n None // Load latest version\n).await?;\n```\n\n### 3. Clusters\n\nComplete cluster configurations (Buildkit, CI/CD, etc.):\n\n```{$detected_lang}\nlet cluster = extension_manager.load_extension(\n ExtensionType::Cluster,\n "buildkit".to_string(),\n Some("1.0.0".to_string())\n).await?;\n```\n\n## Features\n\n### LRU Caching\n\nExtensions are cached using LRU (Least Recently Used) strategy:\n\n- **Cache size**: 100 extensions\n- **Cache key**: `{type}:{name}:{version}`\n- **Automatic eviction**: Oldest entries removed when full\n\n### Type Safety\n\nAll extensions are strongly typed:\n\n```{$detected_lang}\npub struct Extension {\n pub metadata: ExtensionMetadata,\n pub path: String,\n pub loaded_at: chrono::DateTime,\n}\n\npub struct ExtensionMetadata {\n pub name: String,\n pub version: String,\n pub description: String,\n pub extension_type: ExtensionType,\n pub dependencies: Vec,\n pub author: Option,\n pub repository: Option,\n}\n```\n\n### Version Management\n\nLoad specific versions or use latest:\n\n```{$detected_lang}\n// Load specific version\nlet ext = extension_manager.load_extension(\n ExtensionType::Taskserv,\n "kubernetes".to_string(),\n Some("1.28.0".to_string())\n).await?;\n\n// Load latest version\nlet ext = extension_manager.load_extension(\n ExtensionType::Taskserv,\n "kubernetes".to_string(),\n None\n).await?;\n```\n\n## Configuration\n\nExtension settings in `config.defaults.toml`:\n\n```{$detected_lang}\n[orchestrator.extensions]\nauto_load = true\ncache_dir = "{{orchestrator.paths.data_dir}}/extensions"\n```\n\n### Configuration Options\n\n- **auto_load**: Enable automatic extension loading (default: true)\n- **cache_dir**: Directory for caching extension artifacts\n\n## API Endpoints\n\n### List Loaded Extensions\n\n```{$detected_lang}\nGET /api/v1/extensions/loaded\n```\n\n**Response:**\n\n```{$detected_lang}\n{\n "success": true,\n "data": [\n {\n "metadata": {\n "name": "kubernetes",\n "version": "1.28.0",\n "description": "Kubernetes container orchestrator",\n "extension_type": "Taskserv",\n "dependencies": ["containerd", "etcd"],\n "author": "provisioning-team",\n "repository": null\n },\n "path": "extensions/taskservs/kubernetes",\n "loaded_at": "2025-10-06T12:30:00Z"\n }\n ]\n}\n```\n\n### Reload Extension\n\n```{$detected_lang}\nPOST /api/v1/extensions/reload\nContent-Type: application/json\n\n{\n "extension_type": "taskserv",\n "name": "kubernetes"\n}\n```\n\n**Response:**\n\n```{$detected_lang}\n{\n "success": true,\n "data": "Extension kubernetes reloaded"\n}\n```\n\n## Usage Examples\n\n### Load Extension\n\n```{$detected_lang}\nuse provisioning_orchestrator::extensions::{ExtensionManager, ExtensionType};\n\nlet manager = ExtensionManager::new(\n "/usr/local/bin/nu".to_string(),\n "/usr/local/bin/provisioning".to_string(),\n);\n\nlet extension = manager.load_extension(\n ExtensionType::Taskserv,\n "kubernetes".to_string(),\n Some("1.28.0".to_string())\n).await?;\n\nprintln!("Loaded: {} v{}", extension.metadata.name, extension.metadata.version);\n```\n\n### List Loaded Extensions\n\n```{$detected_lang}\nlet extensions = manager.list_loaded_extensions().await;\nfor ext in extensions {\n println!("{} ({}) - loaded at {}",\n ext.metadata.name,\n ext.metadata.extension_type,\n ext.loaded_at\n );\n}\n```\n\n### Reload Extension\n\n```{$detected_lang}\nlet extension = manager.reload_extension(\n ExtensionType::Taskserv,\n "kubernetes".to_string()\n).await?;\n```\n\n### Check if Loaded\n\n```{$detected_lang}\nlet is_loaded = manager.is_extension_loaded(\n ExtensionType::Taskserv,\n "kubernetes"\n).await;\n\nif !is_loaded {\n // Load the extension\n manager.load_extension(\n ExtensionType::Taskserv,\n "kubernetes".to_string(),\n None\n ).await?;\n}\n```\n\n### Clear Cache\n\n```{$detected_lang}\nmanager.clear_cache().await;\n```\n\n## Integration with Workflows\n\n### Taskserv Installation Workflow\n\n1. **Load extension** (before installation)\n2. Validate dependencies\n3. Generate configuration\n4. Execute installation\n5. Verify installation\n\n```{$detected_lang}\n// Step 1: Load extension\nlet extension = extension_manager.load_extension(\n ExtensionType::Taskserv,\n "kubernetes".to_string(),\n Some("1.28.0".to_string())\n).await?;\n\n// Step 2: Validate dependencies\nfor dep in &extension.metadata.dependencies {\n ensure_dependency_installed(dep).await?;\n}\n\n// Continue with installation...\n```\n\n## Nushell Integration\n\nThe extension loader calls Nushell commands:\n\n```{$detected_lang}\n# Load taskserv extension\nprovisioning module load taskserv kubernetes --version 1.28.0\n\n# Discover available extensions\nprovisioning module discover taskserv --output json\n\n# Get extension metadata\nprovisioning module discover taskserv --name kubernetes --output json\n```\n\n## Error Handling\n\nThe extension loader handles errors gracefully:\n\n- **Extension not found**: Returns clear error message\n- **Version mismatch**: Reports available versions\n- **Dependency errors**: Lists missing dependencies\n- **Load failures**: Logs detailed error information\n\n## Testing\n\nRun extension loading tests:\n\n```{$detected_lang}\ncd provisioning/platform/orchestrator\ncargo test test_extension_loading\n```\n\n## Troubleshooting\n\n### Extension load fails\n\n1. Check Nushell is installed and accessible\n2. Verify extension exists in expected location\n3. Check provisioning path configuration\n4. Review orchestrator logs\n\n### Cache issues\n\n1. Clear cache manually: `manager.clear_cache().await`\n2. Check cache directory permissions\n3. Verify disk space availability\n\n## Best Practices\n\n1. **Use versioning**: Always specify version for production\n2. **Cache management**: Clear cache periodically in dev environments\n3. **Dependency validation**: Check dependencies before loading\n4. **Error handling**: Always handle load failures gracefully\n\n## Security Considerations\n\n1. **Code execution**: Extensions execute Nushell code\n2. **Validation**: Verify extension metadata\n3. **Sandboxing**: Consider sandboxed execution\n4. **Audit**: Log all extension loading operations\n\n## Performance\n\n### Cache Hit Ratio\n\nMonitor cache effectiveness:\n\n```{$detected_lang}\nlet total_loads = metrics.total_extension_loads;\nlet cache_hits = metrics.cache_hits;\nlet hit_ratio = cache_hits as f64 / total_loads as f64;\nprintln!("Cache hit ratio: {:.2}%", hit_ratio * 100.0);\n```\n\n### Loading Time\n\nExtension loading is optimized:\n\n- **Cached**: < 1ms\n- **Cold load**: 100-500ms (depends on extension size)\n- **With dependencies**: Variable (depends on dependency count)\n\n## Future Enhancements\n\n- [ ] Extension hot-reload without cache clear\n- [ ] Dependency graph visualization\n- [ ] Extension marketplace integration\n- [ ] Automatic version updates\n- [ ] Extension sandboxing +# Extension Loading Guide + +## Overview + +The extension loading module provides dynamic loading of providers, taskservs, and clusters through Nushell script integration. + +## Architecture + +```bash +┌──────────────────┐ +│ Orchestrator │ +│ (Rust) │ +└────────┬─────────┘ + │ + ▼ +┌──────────────────┐ +│ Extension Manager│ +│ │ +│ - Caching │ +│ - Type safety │ +│ - Validation │ +└────────┬─────────┘ + │ + ▼ +┌──────────────────┐ +│Extension Loader │ +│ (Nushell Call) │ +└────────┬─────────┘ + │ + ▼ +┌──────────────────┐ +│ Nushell Scripts │ +│ (module load) │ +└──────────────────┘ +``` + +## Extension Types + +### 1. Providers + +Cloud provider implementations (AWS, UpCloud, Local): + +```javascript +let provider = extension_manager.load_extension( + ExtensionType::Provider, + "aws".to_string(), + Some("2.0.0".to_string()) +).await?; +``` + +### 2. Taskservs + +Infrastructure service definitions (Kubernetes, PostgreSQL, etc.): + +```javascript +let taskserv = extension_manager.load_extension( + ExtensionType::Taskserv, + "kubernetes".to_string(), + None // Load latest version +).await?; +``` + +### 3. Clusters + +Complete cluster configurations (Buildkit, CI/CD, etc.): + +```javascript +let cluster = extension_manager.load_extension( + ExtensionType::Cluster, + "buildkit".to_string(), + Some("1.0.0".to_string()) +).await?; +``` + +## Features + +### LRU Caching + +Extensions are cached using LRU (Least Recently Used) strategy: + +- **Cache size**: 100 extensions +- **Cache key**: `{type}:{name}:{version}` +- **Automatic eviction**: Oldest entries removed when full + +### Type Safety + +All extensions are strongly typed: + +```rust +pub struct Extension { + pub metadata: ExtensionMetadata, + pub path: String, + pub loaded_at: chrono::DateTime, +} + +pub struct ExtensionMetadata { + pub name: String, + pub version: String, + pub description: String, + pub extension_type: ExtensionType, + pub dependencies: Vec, + pub author: Option, + pub repository: Option, +} +``` + +### Version Management + +Load specific versions or use latest: + +```bash +// Load specific version +let ext = extension_manager.load_extension( + ExtensionType::Taskserv, + "kubernetes".to_string(), + Some("1.28.0".to_string()) +).await?; + +// Load latest version +let ext = extension_manager.load_extension( + ExtensionType::Taskserv, + "kubernetes".to_string(), + None +).await?; +``` + +## Configuration + +Extension settings in `config.defaults.toml`: + +```toml +[orchestrator.extensions] +auto_load = true +cache_dir = "{{orchestrator.paths.data_dir}}/extensions" +``` + +### Configuration Options + +- **auto_load**: Enable automatic extension loading (default: true) +- **cache_dir**: Directory for caching extension artifacts + +## API Endpoints + +### List Loaded Extensions + +```bash +GET /api/v1/extensions/loaded +``` + +**Response:** + +```json +{ + "success": true, + "data": [ + { + "metadata": { + "name": "kubernetes", + "version": "1.28.0", + "description": "Kubernetes container orchestrator", + "extension_type": "Taskserv", + "dependencies": ["containerd", "etcd"], + "author": "provisioning-team", + "repository": null + }, + "path": "extensions/taskservs/kubernetes", + "loaded_at": "2025-10-06T12:30:00Z" + } + ] +} +``` + +### Reload Extension + +```bash +POST /api/v1/extensions/reload +Content-Type: application/json + +{ + "extension_type": "taskserv", + "name": "kubernetes" +} +``` + +**Response:** + +```json +{ + "success": true, + "data": "Extension kubernetes reloaded" +} +``` + +## Usage Examples + +### Load Extension + +```bash +use provisioning_orchestrator::extensions::{ExtensionManager, ExtensionType}; + +let manager = ExtensionManager::new( + "/usr/local/bin/nu".to_string(), + "/usr/local/bin/provisioning".to_string(), +); + +let extension = manager.load_extension( + ExtensionType::Taskserv, + "kubernetes".to_string(), + Some("1.28.0".to_string()) +).await?; + +println!("Loaded: {} v{}", extension.metadata.name, extension.metadata.version); +``` + +### List Loaded Extensions + +```javascript +let extensions = manager.list_loaded_extensions().await; +for ext in extensions { + println!("{} ({}) - loaded at {}", + ext.metadata.name, + ext.metadata.extension_type, + ext.loaded_at + ); +} +``` + +### Reload Extension + +```javascript +let extension = manager.reload_extension( + ExtensionType::Taskserv, + "kubernetes".to_string() +).await?; +``` + +### Check if Loaded + +```javascript +let is_loaded = manager.is_extension_loaded( + ExtensionType::Taskserv, + "kubernetes" +).await; + +if !is_loaded { + // Load the extension + manager.load_extension( + ExtensionType::Taskserv, + "kubernetes".to_string(), + None + ).await?; +} +``` + +### Clear Cache + +```bash +manager.clear_cache().await; +``` + +## Integration with Workflows + +### Taskserv Installation Workflow + +1. **Load extension** (before installation) +2. Validate dependencies +3. Generate configuration +4. Execute installation +5. Verify installation + +```bash +// Step 1: Load extension +let extension = extension_manager.load_extension( + ExtensionType::Taskserv, + "kubernetes".to_string(), + Some("1.28.0".to_string()) +).await?; + +// Step 2: Validate dependencies +for dep in &extension.metadata.dependencies { + ensure_dependency_installed(dep).await?; +} + +// Continue with installation... +``` + +## Nushell Integration + +The extension loader calls Nushell commands: + +```nushell +# Load taskserv extension +provisioning module load taskserv kubernetes --version 1.28.0 + +# Discover available extensions +provisioning module discover taskserv --output json + +# Get extension metadata +provisioning module discover taskserv --name kubernetes --output json +``` + +## Error Handling + +The extension loader handles errors gracefully: + +- **Extension not found**: Returns clear error message +- **Version mismatch**: Reports available versions +- **Dependency errors**: Lists missing dependencies +- **Load failures**: Logs detailed error information + +## Testing + +Run extension loading tests: + +```bash +cd provisioning/platform/orchestrator +cargo test test_extension_loading +``` + +## Troubleshooting + +### Extension load fails + +1. Check Nushell is installed and accessible +2. Verify extension exists in expected location +3. Check provisioning path configuration +4. Review orchestrator logs + +### Cache issues + +1. Clear cache manually: `manager.clear_cache().await` +2. Check cache directory permissions +3. Verify disk space availability + +## Best Practices + +1. **Use versioning**: Always specify version for production +2. **Cache management**: Clear cache periodically in dev environments +3. **Dependency validation**: Check dependencies before loading +4. **Error handling**: Always handle load failures gracefully + +## Security Considerations + +1. **Code execution**: Extensions execute Nushell code +2. **Validation**: Verify extension metadata +3. **Sandboxing**: Consider sandboxed execution +4. **Audit**: Log all extension loading operations + +## Performance + +### Cache Hit Ratio + +Monitor cache effectiveness: + +```javascript +let total_loads = metrics.total_extension_loads; +let cache_hits = metrics.cache_hits; +let hit_ratio = cache_hits as f64 / total_loads as f64; +println!("Cache hit ratio: {:.2}%", hit_ratio * 100.0); +``` + +### Loading Time + +Extension loading is optimized: + +- **Cached**: < 1ms +- **Cold load**: 100-500ms (depends on extension size) +- **With dependencies**: Variable (depends on dependency count) + +## Future Enhancements + +- [ ] Extension hot-reload without cache clear +- [ ] Dependency graph visualization +- [ ] Extension marketplace integration +- [ ] Automatic version updates +- [ ] Extension sandboxing \ No newline at end of file diff --git a/crates/orchestrator/docs/oci-integration.md b/crates/orchestrator/docs/oci-integration.md index 0f70457..2dd3480 100644 --- a/crates/orchestrator/docs/oci-integration.md +++ b/crates/orchestrator/docs/oci-integration.md @@ -1 +1,417 @@ -# OCI Registry Integration Guide\n\n## Overview\n\nThe OCI integration module provides OCI Distribution Spec v2 compliant registry integration for pulling KCL packages and extension artifacts.\n\n## Architecture\n\n```{$detected_lang}\n┌──────────────────┐\n│ Orchestrator │\n│ (Rust) │\n└────────┬─────────┘\n │\n ▼\n┌──────────────────┐\n│ OCI Manager │\n│ │\n│ - LRU caching │\n│ - Pull artifacts│\n│ - List packages │\n└────────┬─────────┘\n │\n ▼\n┌──────────────────┐\n│ OCI Client │\n│ (Distribution) │\n└────────┬─────────┘\n │\n ▼\n┌──────────────────┐\n│ OCI Registry │\n│ (HTTP API v2) │\n└──────────────────┘\n```\n\n## Features\n\n### 1. KCL Package Management\n\nPull KCL configuration packages from OCI registry:\n\n```{$detected_lang}\nlet package_path = oci_manager.pull_kcl_package(\n "provisioning-core",\n "1.0.0"\n).await?;\n```\n\n### 2. Extension Artifacts\n\nPull extension artifacts (providers, taskservs, clusters):\n\n```{$detected_lang}\nlet artifact_path = oci_manager.pull_extension_artifact(\n "taskserv", // Extension type\n "kubernetes", // Extension name\n "1.28.0" // Version\n).await?;\n```\n\n### 3. Manifest Caching\n\nManifests are cached using LRU strategy:\n\n- **Cache size**: 100 manifests\n- **Cache key**: `{name}:{version}`\n- **Automatic eviction**: Oldest entries removed when full\n\n### 4. Artifact Listing\n\nList all artifacts in a namespace:\n\n```{$detected_lang}\nlet artifacts = oci_manager.list_oci_artifacts("kcl").await?;\nfor artifact in artifacts {\n println!("{} v{} ({})", artifact.name, artifact.version, artifact.size);\n}\n```\n\n## OCI Distribution Spec v2\n\nImplements OCI Distribution Specification v2:\n\n- **Manifest retrieval**: `GET /v2/{namespace}/{repository}/manifests/{reference}`\n- **Blob download**: `GET /v2/{namespace}/{repository}/blobs/{digest}`\n- **Tag listing**: `GET /v2/{namespace}/{repository}/tags/list`\n- **Artifact existence**: `HEAD /v2/{namespace}/{repository}/manifests/{reference}`\n\n## Configuration\n\nOCI settings in `config.defaults.toml`:\n\n```{$detected_lang}\n[orchestrator.oci]\nregistry_url = "http://localhost:5000"\nnamespace = "provisioning-extensions"\ncache_dir = "{{orchestrator.paths.data_dir}}/oci-cache"\n```\n\n### Configuration Options\n\n- **registry_url**: OCI registry HTTP endpoint\n- **namespace**: Default namespace for artifacts\n- **cache_dir**: Local cache directory for downloaded artifacts\n\n## API Endpoints\n\n### List OCI Artifacts\n\n```{$detected_lang}\nPOST /api/v1/oci/artifacts\nContent-Type: application/json\n\n{\n "namespace": "kcl"\n}\n```\n\n**Response:**\n\n```{$detected_lang}\n{\n "success": true,\n "data": [\n {\n "name": "provisioning-core",\n "version": "1.0.0",\n "digest": "sha256:abc123...",\n "size": 102400,\n "media_type": "application/vnd.oci.image.manifest.v1+json",\n "created_at": "2025-10-06T12:00:00Z"\n }\n ]\n}\n```\n\n## Usage Examples\n\n### Pull KCL Package\n\n```{$detected_lang}\nuse provisioning_orchestrator::oci::OciManager;\nuse std::path::PathBuf;\n\nlet oci_manager = OciManager::new(\n "http://localhost:5000".to_string(),\n "provisioning-extensions".to_string(),\n PathBuf::from("/tmp/oci-cache"),\n);\n\n// Pull KCL package\nlet package_path = oci_manager.pull_kcl_package(\n "provisioning-core",\n "1.0.0"\n).await?;\n\nprintln!("Package downloaded to: {}", package_path.display());\n\n// Extract package\n// tar -xzf package_path\n```\n\n### Pull Extension Artifact\n\n```{$detected_lang}\n// Pull taskserv extension\nlet artifact_path = oci_manager.pull_extension_artifact(\n "taskserv",\n "kubernetes",\n "1.28.0"\n).await?;\n\n// Extract and install\n// tar -xzf artifact_path -C /target/path\n```\n\n### List Artifacts\n\n```{$detected_lang}\nlet artifacts = oci_manager.list_oci_artifacts("kcl").await?;\n\nfor artifact in artifacts {\n println!("📦 {} v{}", artifact.name, artifact.version);\n println!(" Size: {} bytes", artifact.size);\n println!(" Digest: {}", artifact.digest);\n println!();\n}\n```\n\n### Check Artifact Exists\n\n```{$detected_lang}\nlet exists = oci_manager.artifact_exists(\n "kcl/provisioning-core",\n "1.0.0"\n).await?;\n\nif exists {\n println!("Artifact exists in registry");\n} else {\n println!("Artifact not found");\n}\n```\n\n### Get Manifest (with caching)\n\n```{$detected_lang}\nlet manifest = oci_manager.get_manifest(\n "kcl/provisioning-core",\n "1.0.0"\n).await?;\n\nprintln!("Schema version: {}", manifest.schema_version);\nprintln!("Media type: {}", manifest.media_type);\nprintln!("Layers: {}", manifest.layers.len());\n```\n\n### Clear Manifest Cache\n\n```{$detected_lang}\noci_manager.clear_cache().await;\n```\n\n## OCI Artifact Structure\n\n### Manifest Format\n\n```{$detected_lang}\n{\n "schemaVersion": 2,\n "mediaType": "application/vnd.oci.image.manifest.v1+json",\n "config": {\n "mediaType": "application/vnd.oci.image.config.v1+json",\n "digest": "sha256:abc123...",\n "size": 1234\n },\n "layers": [\n {\n "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",\n "digest": "sha256:def456...",\n "size": 102400\n }\n ],\n "annotations": {\n "org.opencontainers.image.created": "2025-10-06T12:00:00Z",\n "org.opencontainers.image.version": "1.0.0"\n }\n}\n```\n\n## Integration with Workflows\n\n### Extension Installation with OCI\n\n1. **Check local cache**\n2. **Pull from OCI registry** (if not cached)\n3. Extract artifact\n4. Validate contents\n5. Install extension\n\n```{$detected_lang}\n// Workflow: Install taskserv from OCI\nasync fn install_taskserv_from_oci(\n oci_manager: &OciManager,\n name: &str,\n version: &str\n) -> Result<()> {\n // Pull artifact\n let artifact_path = oci_manager.pull_extension_artifact(\n "taskserv",\n name,\n version\n ).await?;\n\n // Extract\n extract_tarball(&artifact_path, &target_dir)?;\n\n // Validate\n validate_extension_structure(&target_dir)?;\n\n // Install\n install_extension(&target_dir)?;\n\n Ok(())\n}\n```\n\n## Cache Management\n\n### Cache Directory Structure\n\n```{$detected_lang}\n/tmp/oci-cache/\n├── kcl/\n│ └── provisioning-core/\n│ └── 1.0.0/\n│ └── package.tar.gz\n├── extensions/\n│ ├── taskserv/\n│ │ └── kubernetes/\n│ │ └── 1.28.0/\n│ │ └── artifact.tar.gz\n│ └── provider/\n│ └── aws/\n│ └── 2.0.0/\n│ └── artifact.tar.gz\n```\n\n### Cache Cleanup\n\nImplement cache cleanup strategy:\n\n```{$detected_lang}\n// Clean old artifacts\nasync fn cleanup_old_artifacts(cache_dir: &Path, max_age_days: u64) -> Result<()> {\n let cutoff = Utc::now() - Duration::days(max_age_days as i64);\n\n for entry in std::fs::read_dir(cache_dir)? {\n let entry = entry?;\n let metadata = entry.metadata()?;\n\n if let Ok(modified) = metadata.modified() {\n let modified: DateTime = modified.into();\n if modified < cutoff {\n std::fs::remove_dir_all(entry.path())?;\n }\n }\n }\n\n Ok(())\n}\n```\n\n## Error Handling\n\nThe OCI integration handles errors gracefully:\n\n- **Network errors**: Retries with exponential backoff\n- **Manifest not found**: Returns clear error message\n- **Corrupted downloads**: Validates digest before returning\n- **Disk full**: Reports storage error\n\n## Testing\n\nRun OCI integration tests:\n\n```{$detected_lang}\ncd provisioning/platform/orchestrator\ncargo test test_oci_integration\n```\n\n## Troubleshooting\n\n### Artifact pull fails\n\n1. Check OCI registry is accessible\n2. Verify `registry_url` configuration\n3. Check network connectivity\n4. Verify artifact exists in registry\n5. Review orchestrator logs\n\n### Digest mismatch\n\n1. Clear local cache\n2. Re-pull artifact\n3. Verify registry integrity\n4. Check for network corruption\n\n### Cache issues\n\n1. Check cache directory permissions\n2. Verify disk space\n3. Clear cache manually if corrupted\n\n## Best Practices\n\n1. **Use specific versions**: Always specify version for production\n2. **Verify digests**: Validate artifact integrity\n3. **Cache management**: Implement cleanup strategy\n4. **Error handling**: Handle network failures gracefully\n5. **Monitor downloads**: Track download times and failures\n\n## Security Considerations\n\n1. **TLS/HTTPS**: Use secure registry connections in production\n2. **Authentication**: Implement registry authentication\n3. **Digest verification**: Always verify artifact digests\n4. **Access control**: Restrict registry access\n5. **Audit logging**: Log all pull operations\n\n## Performance\n\n### Download Optimization\n\n- **Parallel layers**: Download layers in parallel\n- **Resume support**: Resume interrupted downloads\n- **Compression**: Use gzip for smaller transfers\n- **Local cache**: Cache frequently used artifacts\n\n### Metrics\n\nTrack OCI operations:\n\n- **Pull count**: Number of artifact pulls\n- **Cache hits**: Percentage of cache hits\n- **Download time**: Average download duration\n- **Bandwidth usage**: Total bytes downloaded\n\n## Future Enhancements\n\n- [ ] Push artifacts to registry\n- [ ] Registry authentication (OAuth2, Basic Auth)\n- [ ] Multi-registry support\n- [ ] Mirror/proxy registry\n- [ ] Artifact signing and verification\n- [ ] Garbage collection for cache +# OCI Registry Integration Guide + +## Overview + +The OCI integration module provides OCI Distribution Spec v2 compliant registry integration for pulling KCL packages and extension artifacts. + +## Architecture + +```bash +┌──────────────────┐ +│ Orchestrator │ +│ (Rust) │ +└────────┬─────────┘ + │ + ▼ +┌──────────────────┐ +│ OCI Manager │ +│ │ +│ - LRU caching │ +│ - Pull artifacts│ +│ - List packages │ +└────────┬─────────┘ + │ + ▼ +┌──────────────────┐ +│ OCI Client │ +│ (Distribution) │ +└────────┬─────────┘ + │ + ▼ +┌──────────────────┐ +│ OCI Registry │ +│ (HTTP API v2) │ +└──────────────────┘ +``` + +## Features + +### 1. KCL Package Management + +Pull KCL configuration packages from OCI registry: + +```javascript +let package_path = oci_manager.pull_kcl_package( + "provisioning-core", + "1.0.0" +).await?; +``` + +### 2. Extension Artifacts + +Pull extension artifacts (providers, taskservs, clusters): + +```javascript +let artifact_path = oci_manager.pull_extension_artifact( + "taskserv", // Extension type + "kubernetes", // Extension name + "1.28.0" // Version +).await?; +``` + +### 3. Manifest Caching + +Manifests are cached using LRU strategy: + +- **Cache size**: 100 manifests +- **Cache key**: `{name}:{version}` +- **Automatic eviction**: Oldest entries removed when full + +### 4. Artifact Listing + +List all artifacts in a namespace: + +```javascript +let artifacts = oci_manager.list_oci_artifacts("kcl").await?; +for artifact in artifacts { + println!("{} v{} ({})", artifact.name, artifact.version, artifact.size); +} +``` + +## OCI Distribution Spec v2 + +Implements OCI Distribution Specification v2: + +- **Manifest retrieval**: `GET /v2/{namespace}/{repository}/manifests/{reference}` +- **Blob download**: `GET /v2/{namespace}/{repository}/blobs/{digest}` +- **Tag listing**: `GET /v2/{namespace}/{repository}/tags/list` +- **Artifact existence**: `HEAD /v2/{namespace}/{repository}/manifests/{reference}` + +## Configuration + +OCI settings in `config.defaults.toml`: + +```toml +[orchestrator.oci] +registry_url = "http://localhost:5000" +namespace = "provisioning-extensions" +cache_dir = "{{orchestrator.paths.data_dir}}/oci-cache" +``` + +### Configuration Options + +- **registry_url**: OCI registry HTTP endpoint +- **namespace**: Default namespace for artifacts +- **cache_dir**: Local cache directory for downloaded artifacts + +## API Endpoints + +### List OCI Artifacts + +```bash +POST /api/v1/oci/artifacts +Content-Type: application/json + +{ + "namespace": "kcl" +} +``` + +**Response:** + +```json +{ + "success": true, + "data": [ + { + "name": "provisioning-core", + "version": "1.0.0", + "digest": "sha256:abc123...", + "size": 102400, + "media_type": "application/vnd.oci.image.manifest.v1+json", + "created_at": "2025-10-06T12:00:00Z" + } + ] +} +``` + +## Usage Examples + +### Pull KCL Package + +```bash +use provisioning_orchestrator::oci::OciManager; +use std::path::PathBuf; + +let oci_manager = OciManager::new( + "http://localhost:5000".to_string(), + "provisioning-extensions".to_string(), + PathBuf::from("/tmp/oci-cache"), +); + +// Pull KCL package +let package_path = oci_manager.pull_kcl_package( + "provisioning-core", + "1.0.0" +).await?; + +println!("Package downloaded to: {}", package_path.display()); + +// Extract package +// tar -xzf package_path +``` + +### Pull Extension Artifact + +```bash +// Pull taskserv extension +let artifact_path = oci_manager.pull_extension_artifact( + "taskserv", + "kubernetes", + "1.28.0" +).await?; + +// Extract and install +// tar -xzf artifact_path -C /target/path +``` + +### List Artifacts + +```javascript +let artifacts = oci_manager.list_oci_artifacts("kcl").await?; + +for artifact in artifacts { + println!("📦 {} v{}", artifact.name, artifact.version); + println!(" Size: {} bytes", artifact.size); + println!(" Digest: {}", artifact.digest); + println!(); +} +``` + +### Check Artifact Exists + +```javascript +let exists = oci_manager.artifact_exists( + "kcl/provisioning-core", + "1.0.0" +).await?; + +if exists { + println!("Artifact exists in registry"); +} else { + println!("Artifact not found"); +} +``` + +### Get Manifest (with caching) + +```javascript +let manifest = oci_manager.get_manifest( + "kcl/provisioning-core", + "1.0.0" +).await?; + +println!("Schema version: {}", manifest.schema_version); +println!("Media type: {}", manifest.media_type); +println!("Layers: {}", manifest.layers.len()); +``` + +### Clear Manifest Cache + +```bash +oci_manager.clear_cache().await; +``` + +## OCI Artifact Structure + +### Manifest Format + +```json +{ + "schemaVersion": 2, + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "config": { + "mediaType": "application/vnd.oci.image.config.v1+json", + "digest": "sha256:abc123...", + "size": 1234 + }, + "layers": [ + { + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:def456...", + "size": 102400 + } + ], + "annotations": { + "org.opencontainers.image.created": "2025-10-06T12:00:00Z", + "org.opencontainers.image.version": "1.0.0" + } +} +``` + +## Integration with Workflows + +### Extension Installation with OCI + +1. **Check local cache** +2. **Pull from OCI registry** (if not cached) +3. Extract artifact +4. Validate contents +5. Install extension + +```bash +// Workflow: Install taskserv from OCI +async fn install_taskserv_from_oci( + oci_manager: &OciManager, + name: &str, + version: &str +) -> Result<()> { + // Pull artifact + let artifact_path = oci_manager.pull_extension_artifact( + "taskserv", + name, + version + ).await?; + + // Extract + extract_tarball(&artifact_path, &target_dir)?; + + // Validate + validate_extension_structure(&target_dir)?; + + // Install + install_extension(&target_dir)?; + + Ok(()) +} +``` + +## Cache Management + +### Cache Directory Structure + +```bash +/tmp/oci-cache/ +├── kcl/ +│ └── provisioning-core/ +│ └── 1.0.0/ +│ └── package.tar.gz +├── extensions/ +│ ├── taskserv/ +│ │ └── kubernetes/ +│ │ └── 1.28.0/ +│ │ └── artifact.tar.gz +│ └── provider/ +│ └── aws/ +│ └── 2.0.0/ +│ └── artifact.tar.gz +``` + +### Cache Cleanup + +Implement cache cleanup strategy: + +```bash +// Clean old artifacts +async fn cleanup_old_artifacts(cache_dir: &Path, max_age_days: u64) -> Result<()> { + let cutoff = Utc::now() - Duration::days(max_age_days as i64); + + for entry in std::fs::read_dir(cache_dir)? { + let entry = entry?; + let metadata = entry.metadata()?; + + if let Ok(modified) = metadata.modified() { + let modified: DateTime = modified.into(); + if modified < cutoff { + std::fs::remove_dir_all(entry.path())?; + } + } + } + + Ok(()) +} +``` + +## Error Handling + +The OCI integration handles errors gracefully: + +- **Network errors**: Retries with exponential backoff +- **Manifest not found**: Returns clear error message +- **Corrupted downloads**: Validates digest before returning +- **Disk full**: Reports storage error + +## Testing + +Run OCI integration tests: + +```bash +cd provisioning/platform/orchestrator +cargo test test_oci_integration +``` + +## Troubleshooting + +### Artifact pull fails + +1. Check OCI registry is accessible +2. Verify `registry_url` configuration +3. Check network connectivity +4. Verify artifact exists in registry +5. Review orchestrator logs + +### Digest mismatch + +1. Clear local cache +2. Re-pull artifact +3. Verify registry integrity +4. Check for network corruption + +### Cache issues + +1. Check cache directory permissions +2. Verify disk space +3. Clear cache manually if corrupted + +## Best Practices + +1. **Use specific versions**: Always specify version for production +2. **Verify digests**: Validate artifact integrity +3. **Cache management**: Implement cleanup strategy +4. **Error handling**: Handle network failures gracefully +5. **Monitor downloads**: Track download times and failures + +## Security Considerations + +1. **TLS/HTTPS**: Use secure registry connections in production +2. **Authentication**: Implement registry authentication +3. **Digest verification**: Always verify artifact digests +4. **Access control**: Restrict registry access +5. **Audit logging**: Log all pull operations + +## Performance + +### Download Optimization + +- **Parallel layers**: Download layers in parallel +- **Resume support**: Resume interrupted downloads +- **Compression**: Use gzip for smaller transfers +- **Local cache**: Cache frequently used artifacts + +### Metrics + +Track OCI operations: + +- **Pull count**: Number of artifact pulls +- **Cache hits**: Percentage of cache hits +- **Download time**: Average download duration +- **Bandwidth usage**: Total bytes downloaded + +## Future Enhancements + +- [ ] Push artifacts to registry +- [ ] Registry authentication (OAuth2, Basic Auth) +- [ ] Multi-registry support +- [ ] Mirror/proxy registry +- [ ] Artifact signing and verification +- [ ] Garbage collection for cache \ No newline at end of file diff --git a/crates/orchestrator/docs/service-orchestration.md b/crates/orchestrator/docs/service-orchestration.md index e9cac27..25b0cc5 100644 --- a/crates/orchestrator/docs/service-orchestration.md +++ b/crates/orchestrator/docs/service-orchestration.md @@ -1 +1,467 @@ -# Service Orchestration Guide\n\n## Overview\n\nThe service orchestration module manages platform services with dependency-based startup, health checking, and automatic service coordination.\n\n## Architecture\n\n```{$detected_lang}\n┌──────────────────────┐\n│ Orchestrator │\n│ (Rust) │\n└──────────┬───────────┘\n │\n ▼\n┌──────────────────────┐\n│ Service Orchestrator │\n│ │\n│ - Dependency graph │\n│ - Startup order │\n│ - Health checking │\n└──────────┬───────────┘\n │\n ▼\n┌──────────────────────┐\n│ Service Manager │\n│ (Nushell calls) │\n└──────────┬───────────┘\n │\n ▼\n┌──────────────────────┐\n│ Platform Services │\n│ (CoreDNS, OCI, etc) │\n└──────────────────────┘\n```\n\n## Features\n\n### 1. Dependency Resolution\n\nAutomatically resolve service startup order based on dependencies:\n\n```{$detected_lang}\nlet order = service_orchestrator.resolve_startup_order(&[\n "service-c".to_string()\n]).await?;\n\n// Returns: ["service-a", "service-b", "service-c"]\n```\n\n### 2. Automatic Dependency Startup\n\nWhen enabled, dependencies are started automatically:\n\n```{$detected_lang}\n// Start service with dependencies\nservice_orchestrator.start_service("web-app").await?;\n\n// Automatically starts: database -> cache -> web-app\n```\n\n### 3. Health Checking\n\nMonitor service health with HTTP or process checks:\n\n```{$detected_lang}\nlet health = service_orchestrator.check_service_health("web-app").await?;\n\nif health.healthy {\n println!("Service is healthy: {}", health.message);\n}\n```\n\n### 4. Service Status\n\nGet current status of any registered service:\n\n```{$detected_lang}\nlet status = service_orchestrator.get_service_status("web-app").await?;\n\nmatch status {\n ServiceStatus::Running => println!("Service is running"),\n ServiceStatus::Stopped => println!("Service is stopped"),\n ServiceStatus::Failed => println!("Service has failed"),\n ServiceStatus::Unknown => println!("Service status unknown"),\n}\n```\n\n## Service Definition\n\n### Service Structure\n\n```{$detected_lang}\npub struct Service {\n pub name: String,\n pub description: String,\n pub dependencies: Vec,\n pub start_command: String,\n pub stop_command: String,\n pub health_check_endpoint: Option,\n}\n```\n\n### Example Service Definition\n\n```{$detected_lang}\nlet coredns_service = Service {\n name: "coredns".to_string(),\n description: "CoreDNS DNS server".to_string(),\n dependencies: vec![], // No dependencies\n start_command: "systemctl start coredns".to_string(),\n stop_command: "systemctl stop coredns".to_string(),\n health_check_endpoint: Some("http://localhost:53/health".to_string()),\n};\n```\n\n### Service with Dependencies\n\n```{$detected_lang}\nlet oci_registry = Service {\n name: "oci-registry".to_string(),\n description: "OCI distribution registry".to_string(),\n dependencies: vec!["coredns".to_string()], // Depends on DNS\n start_command: "systemctl start oci-registry".to_string(),\n stop_command: "systemctl stop oci-registry".to_string(),\n health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),\n};\n```\n\n## Configuration\n\nService orchestration settings in `config.defaults.toml`:\n\n```{$detected_lang}\n[orchestrator.services]\nmanager_enabled = true\nauto_start_dependencies = true\n```\n\n### Configuration Options\n\n- **manager_enabled**: Enable service orchestration (default: true)\n- **auto_start_dependencies**: Auto-start dependencies when starting a service (default: true)\n\n## API Endpoints\n\n### List Services\n\n```{$detected_lang}\nGET /api/v1/services/list\n```\n\n**Response:**\n\n```{$detected_lang}\n{\n "success": true,\n "data": [\n {\n "name": "coredns",\n "description": "CoreDNS DNS server",\n "dependencies": [],\n "start_command": "systemctl start coredns",\n "stop_command": "systemctl stop coredns",\n "health_check_endpoint": "http://localhost:53/health"\n }\n ]\n}\n```\n\n### Get Services Status\n\n```{$detected_lang}\nGET /api/v1/services/status\n```\n\n**Response:**\n\n```{$detected_lang}\n{\n "success": true,\n "data": [\n {\n "name": "coredns",\n "status": "Running"\n },\n {\n "name": "oci-registry",\n "status": "Running"\n }\n ]\n}\n```\n\n## Usage Examples\n\n### Register Services\n\n```{$detected_lang}\nuse provisioning_orchestrator::services::{ServiceOrchestrator, Service};\n\nlet orchestrator = ServiceOrchestrator::new(\n "/usr/local/bin/nu".to_string(),\n "/usr/local/bin/provisioning".to_string(),\n true, // auto_start_dependencies\n);\n\n// Register CoreDNS\nlet coredns = Service {\n name: "coredns".to_string(),\n description: "CoreDNS DNS server".to_string(),\n dependencies: vec![],\n start_command: "systemctl start coredns".to_string(),\n stop_command: "systemctl stop coredns".to_string(),\n health_check_endpoint: Some("http://localhost:53/health".to_string()),\n};\n\norchestrator.register_service(coredns).await;\n\n// Register OCI Registry (depends on CoreDNS)\nlet oci = Service {\n name: "oci-registry".to_string(),\n description: "OCI distribution registry".to_string(),\n dependencies: vec!["coredns".to_string()],\n start_command: "systemctl start oci-registry".to_string(),\n stop_command: "systemctl stop oci-registry".to_string(),\n health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),\n};\n\norchestrator.register_service(oci).await;\n```\n\n### Start Service with Dependencies\n\n```{$detected_lang}\n// This will automatically start coredns first, then oci-registry\norchestrator.start_service("oci-registry").await?;\n```\n\n### Resolve Startup Order\n\n```{$detected_lang}\nlet services = vec![\n "web-app".to_string(),\n "api-server".to_string(),\n];\n\nlet order = orchestrator.resolve_startup_order(&services).await?;\n\nprintln!("Startup order:");\nfor (i, service) in order.iter().enumerate() {\n println!("{}. {}", i + 1, service);\n}\n```\n\n### Start All Services\n\n```{$detected_lang}\nlet started = orchestrator.start_all_services().await?;\n\nprintln!("Started {} services:", started.len());\nfor service in started {\n println!(" ✓ {}", service);\n}\n```\n\n### Check Service Health\n\n```{$detected_lang}\nlet health = orchestrator.check_service_health("coredns").await?;\n\nif health.healthy {\n println!("✓ {} is healthy", "coredns");\n println!(" Message: {}", health.message);\n println!(" Last check: {}", health.last_check);\n} else {\n println!("✗ {} is unhealthy", "coredns");\n println!(" Message: {}", health.message);\n}\n```\n\n## Dependency Graph Examples\n\n### Simple Chain\n\n```{$detected_lang}\nA -> B -> C\n```\n\nStartup order: A, B, C\n\n```{$detected_lang}\nlet a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ };\nlet b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };\nlet c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ };\n```\n\n### Diamond Dependency\n\n```{$detected_lang}\n A\n / \\n B C\n \ /\n D\n```\n\nStartup order: A, B, C, D (B and C can start in parallel)\n\n```{$detected_lang}\nlet a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ };\nlet b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };\nlet c = Service { name: "c".to_string(), dependencies: vec!["a".to_string()], /* ... */ };\nlet d = Service { name: "d".to_string(), dependencies: vec!["b".to_string(), "c".to_string()], /* ... */ };\n```\n\n### Complex Dependency\n\n```{$detected_lang}\n A\n |\n B\n / \\n C D\n | |\n E F\n \ /\n G\n```\n\nStartup order: A, B, C, D, E, F, G\n\n## Integration with Platform Services\n\n### CoreDNS Service\n\n```{$detected_lang}\nlet coredns = Service {\n name: "coredns".to_string(),\n description: "CoreDNS DNS server for automatic DNS registration".to_string(),\n dependencies: vec![],\n start_command: "systemctl start coredns".to_string(),\n stop_command: "systemctl stop coredns".to_string(),\n health_check_endpoint: Some("http://localhost:53/health".to_string()),\n};\n```\n\n### OCI Registry Service\n\n```{$detected_lang}\nlet oci_registry = Service {\n name: "oci-registry".to_string(),\n description: "OCI distribution registry for artifacts".to_string(),\n dependencies: vec!["coredns".to_string()],\n start_command: "systemctl start oci-registry".to_string(),\n stop_command: "systemctl stop oci-registry".to_string(),\n health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),\n};\n```\n\n### Orchestrator Service\n\n```{$detected_lang}\nlet orchestrator = Service {\n name: "orchestrator".to_string(),\n description: "Main orchestrator service".to_string(),\n dependencies: vec!["coredns".to_string(), "oci-registry".to_string()],\n start_command: "./scripts/start-orchestrator.nu --background".to_string(),\n stop_command: "./scripts/start-orchestrator.nu --stop".to_string(),\n health_check_endpoint: Some("http://localhost:9090/health".to_string()),\n};\n```\n\n## Error Handling\n\nThe service orchestrator handles errors gracefully:\n\n- **Missing dependencies**: Reports missing services\n- **Circular dependencies**: Detects and reports cycles\n- **Start failures**: Continues with other services\n- **Health check failures**: Marks service as unhealthy\n\n### Circular Dependency Detection\n\n```{$detected_lang}\n// This would create a cycle: A -> B -> C -> A\nlet a = Service { name: "a".to_string(), dependencies: vec!["c".to_string()], /* ... */ };\nlet b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };\nlet c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ };\n\n// Error: Circular dependency detected\nlet result = orchestrator.resolve_startup_order(&["a".to_string()]).await;\nassert!(result.is_err());\n```\n\n## Testing\n\nRun service orchestration tests:\n\n```{$detected_lang}\ncd provisioning/platform/orchestrator\ncargo test test_service_orchestration\n```\n\n## Troubleshooting\n\n### Service fails to start\n\n1. Check service is registered\n2. Verify dependencies are running\n3. Review service start command\n4. Check service logs\n5. Verify permissions\n\n### Dependency resolution fails\n\n1. Check for circular dependencies\n2. Verify all services are registered\n3. Review dependency declarations\n\n### Health check fails\n\n1. Verify health endpoint is correct\n2. Check service is actually running\n3. Review network connectivity\n4. Check health check timeout\n\n## Best Practices\n\n1. **Minimize dependencies**: Only declare necessary dependencies\n2. **Health endpoints**: Implement health checks for all services\n3. **Graceful shutdown**: Implement proper stop commands\n4. **Idempotent starts**: Ensure services can be restarted safely\n5. **Error logging**: Log all service operations\n\n## Security Considerations\n\n1. **Command injection**: Validate service commands\n2. **Access control**: Restrict service management\n3. **Audit logging**: Log all service operations\n4. **Least privilege**: Run services with minimal permissions\n\n## Performance\n\n### Startup Optimization\n\n- **Parallel starts**: Services without dependencies start in parallel\n- **Dependency caching**: Cache dependency resolution\n- **Health check batching**: Batch health checks for efficiency\n\n### Monitoring\n\nTrack service metrics:\n\n- **Start time**: Time to start each service\n- **Health check latency**: Health check response time\n- **Failure rate**: Percentage of failed starts\n- **Uptime**: Service availability percentage\n\n## Future Enhancements\n\n- [ ] Service restart policies\n- [ ] Graceful shutdown ordering\n- [ ] Service watchdog\n- [ ] Auto-restart on failure\n- [ ] Service templates\n- [ ] Container-based services +# Service Orchestration Guide + +## Overview + +The service orchestration module manages platform services with dependency-based startup, health checking, and automatic service coordination. + +## Architecture + +```bash +┌──────────────────────┐ +│ Orchestrator │ +│ (Rust) │ +└──────────┬───────────┘ + │ + ▼ +┌──────────────────────┐ +│ Service Orchestrator │ +│ │ +│ - Dependency graph │ +│ - Startup order │ +│ - Health checking │ +└──────────┬───────────┘ + │ + ▼ +┌──────────────────────┐ +│ Service Manager │ +│ (Nushell calls) │ +└──────────┬───────────┘ + │ + ▼ +┌──────────────────────┐ +│ Platform Services │ +│ (CoreDNS, OCI, etc) │ +└──────────────────────┘ +``` + +## Features + +### 1. Dependency Resolution + +Automatically resolve service startup order based on dependencies: + +```javascript +let order = service_orchestrator.resolve_startup_order(&[ + "service-c".to_string() +]).await?; + +// Returns: ["service-a", "service-b", "service-c"] +``` + +### 2. Automatic Dependency Startup + +When enabled, dependencies are started automatically: + +```bash +// Start service with dependencies +service_orchestrator.start_service("web-app").await?; + +// Automatically starts: database -> cache -> web-app +``` + +### 3. Health Checking + +Monitor service health with HTTP or process checks: + +```javascript +let health = service_orchestrator.check_service_health("web-app").await?; + +if health.healthy { + println!("Service is healthy: {}", health.message); +} +``` + +### 4. Service Status + +Get current status of any registered service: + +```javascript +let status = service_orchestrator.get_service_status("web-app").await?; + +match status { + ServiceStatus::Running => println!("Service is running"), + ServiceStatus::Stopped => println!("Service is stopped"), + ServiceStatus::Failed => println!("Service has failed"), + ServiceStatus::Unknown => println!("Service status unknown"), +} +``` + +## Service Definition + +### Service Structure + +```rust +pub struct Service { + pub name: String, + pub description: String, + pub dependencies: Vec, + pub start_command: String, + pub stop_command: String, + pub health_check_endpoint: Option, +} +``` + +### Example Service Definition + +```javascript +let coredns_service = Service { + name: "coredns".to_string(), + description: "CoreDNS DNS server".to_string(), + dependencies: vec![], // No dependencies + start_command: "systemctl start coredns".to_string(), + stop_command: "systemctl stop coredns".to_string(), + health_check_endpoint: Some("http://localhost:53/health".to_string()), +}; +``` + +### Service with Dependencies + +```javascript +let oci_registry = Service { + name: "oci-registry".to_string(), + description: "OCI distribution registry".to_string(), + dependencies: vec!["coredns".to_string()], // Depends on DNS + start_command: "systemctl start oci-registry".to_string(), + stop_command: "systemctl stop oci-registry".to_string(), + health_check_endpoint: Some("http://localhost:5000/v2/".to_string()), +}; +``` + +## Configuration + +Service orchestration settings in `config.defaults.toml`: + +```toml +[orchestrator.services] +manager_enabled = true +auto_start_dependencies = true +``` + +### Configuration Options + +- **manager_enabled**: Enable service orchestration (default: true) +- **auto_start_dependencies**: Auto-start dependencies when starting a service (default: true) + +## API Endpoints + +### List Services + +```bash +GET /api/v1/services/list +``` + +**Response:** + +```json +{ + "success": true, + "data": [ + { + "name": "coredns", + "description": "CoreDNS DNS server", + "dependencies": [], + "start_command": "systemctl start coredns", + "stop_command": "systemctl stop coredns", + "health_check_endpoint": "http://localhost:53/health" + } + ] +} +``` + +### Get Services Status + +```bash +GET /api/v1/services/status +``` + +**Response:** + +```json +{ + "success": true, + "data": [ + { + "name": "coredns", + "status": "Running" + }, + { + "name": "oci-registry", + "status": "Running" + } + ] +} +``` + +## Usage Examples + +### Register Services + +```bash +use provisioning_orchestrator::services::{ServiceOrchestrator, Service}; + +let orchestrator = ServiceOrchestrator::new( + "/usr/local/bin/nu".to_string(), + "/usr/local/bin/provisioning".to_string(), + true, // auto_start_dependencies +); + +// Register CoreDNS +let coredns = Service { + name: "coredns".to_string(), + description: "CoreDNS DNS server".to_string(), + dependencies: vec![], + start_command: "systemctl start coredns".to_string(), + stop_command: "systemctl stop coredns".to_string(), + health_check_endpoint: Some("http://localhost:53/health".to_string()), +}; + +orchestrator.register_service(coredns).await; + +// Register OCI Registry (depends on CoreDNS) +let oci = Service { + name: "oci-registry".to_string(), + description: "OCI distribution registry".to_string(), + dependencies: vec!["coredns".to_string()], + start_command: "systemctl start oci-registry".to_string(), + stop_command: "systemctl stop oci-registry".to_string(), + health_check_endpoint: Some("http://localhost:5000/v2/".to_string()), +}; + +orchestrator.register_service(oci).await; +``` + +### Start Service with Dependencies + +```bash +// This will automatically start coredns first, then oci-registry +orchestrator.start_service("oci-registry").await?; +``` + +### Resolve Startup Order + +```javascript +let services = vec![ + "web-app".to_string(), + "api-server".to_string(), +]; + +let order = orchestrator.resolve_startup_order(&services).await?; + +println!("Startup order:"); +for (i, service) in order.iter().enumerate() { + println!("{}. {}", i + 1, service); +} +``` + +### Start All Services + +```javascript +let started = orchestrator.start_all_services().await?; + +println!("Started {} services:", started.len()); +for service in started { + println!(" ✓ {}", service); +} +``` + +### Check Service Health + +```javascript +let health = orchestrator.check_service_health("coredns").await?; + +if health.healthy { + println!("✓ {} is healthy", "coredns"); + println!(" Message: {}", health.message); + println!(" Last check: {}", health.last_check); +} else { + println!("✗ {} is unhealthy", "coredns"); + println!(" Message: {}", health.message); +} +``` + +## Dependency Graph Examples + +### Simple Chain + +```bash +A -> B -> C +``` + +Startup order: A, B, C + +```javascript +let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ }; +let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ }; +let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ }; +``` + +### Diamond Dependency + +```bash + A + / + B C + \ / + D +``` + +Startup order: A, B, C, D (B and C can start in parallel) + +```javascript +let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ }; +let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ }; +let c = Service { name: "c".to_string(), dependencies: vec!["a".to_string()], /* ... */ }; +let d = Service { name: "d".to_string(), dependencies: vec!["b".to_string(), "c".to_string()], /* ... */ }; +``` + +### Complex Dependency + +```bash + A + | + B + / + C D + | | + E F + \ / + G +``` + +Startup order: A, B, C, D, E, F, G + +## Integration with Platform Services + +### CoreDNS Service + +```javascript +let coredns = Service { + name: "coredns".to_string(), + description: "CoreDNS DNS server for automatic DNS registration".to_string(), + dependencies: vec![], + start_command: "systemctl start coredns".to_string(), + stop_command: "systemctl stop coredns".to_string(), + health_check_endpoint: Some("http://localhost:53/health".to_string()), +}; +``` + +### OCI Registry Service + +```javascript +let oci_registry = Service { + name: "oci-registry".to_string(), + description: "OCI distribution registry for artifacts".to_string(), + dependencies: vec!["coredns".to_string()], + start_command: "systemctl start oci-registry".to_string(), + stop_command: "systemctl stop oci-registry".to_string(), + health_check_endpoint: Some("http://localhost:5000/v2/".to_string()), +}; +``` + +### Orchestrator Service + +```javascript +let orchestrator = Service { + name: "orchestrator".to_string(), + description: "Main orchestrator service".to_string(), + dependencies: vec!["coredns".to_string(), "oci-registry".to_string()], + start_command: "./scripts/start-orchestrator.nu --background".to_string(), + stop_command: "./scripts/start-orchestrator.nu --stop".to_string(), + health_check_endpoint: Some("http://localhost:9090/health".to_string()), +}; +``` + +## Error Handling + +The service orchestrator handles errors gracefully: + +- **Missing dependencies**: Reports missing services +- **Circular dependencies**: Detects and reports cycles +- **Start failures**: Continues with other services +- **Health check failures**: Marks service as unhealthy + +### Circular Dependency Detection + +```bash +// This would create a cycle: A -> B -> C -> A +let a = Service { name: "a".to_string(), dependencies: vec!["c".to_string()], /* ... */ }; +let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ }; +let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ }; + +// Error: Circular dependency detected +let result = orchestrator.resolve_startup_order(&["a".to_string()]).await; +assert!(result.is_err()); +``` + +## Testing + +Run service orchestration tests: + +```bash +cd provisioning/platform/orchestrator +cargo test test_service_orchestration +``` + +## Troubleshooting + +### Service fails to start + +1. Check service is registered +2. Verify dependencies are running +3. Review service start command +4. Check service logs +5. Verify permissions + +### Dependency resolution fails + +1. Check for circular dependencies +2. Verify all services are registered +3. Review dependency declarations + +### Health check fails + +1. Verify health endpoint is correct +2. Check service is actually running +3. Review network connectivity +4. Check health check timeout + +## Best Practices + +1. **Minimize dependencies**: Only declare necessary dependencies +2. **Health endpoints**: Implement health checks for all services +3. **Graceful shutdown**: Implement proper stop commands +4. **Idempotent starts**: Ensure services can be restarted safely +5. **Error logging**: Log all service operations + +## Security Considerations + +1. **Command injection**: Validate service commands +2. **Access control**: Restrict service management +3. **Audit logging**: Log all service operations +4. **Least privilege**: Run services with minimal permissions + +## Performance + +### Startup Optimization + +- **Parallel starts**: Services without dependencies start in parallel +- **Dependency caching**: Cache dependency resolution +- **Health check batching**: Batch health checks for efficiency + +### Monitoring + +Track service metrics: + +- **Start time**: Time to start each service +- **Health check latency**: Health check response time +- **Failure rate**: Percentage of failed starts +- **Uptime**: Service availability percentage + +## Future Enhancements + +- [ ] Service restart policies +- [ ] Graceful shutdown ordering +- [ ] Service watchdog +- [ ] Auto-restart on failure +- [ ] Service templates +- [ ] Container-based services \ No newline at end of file diff --git a/crates/orchestrator/docs/ssh-key-management.md b/crates/orchestrator/docs/ssh-key-management.md index 4ebc1bb..549dcbc 100644 --- a/crates/orchestrator/docs/ssh-key-management.md +++ b/crates/orchestrator/docs/ssh-key-management.md @@ -1 +1,525 @@ -# SSH Temporal Key Management System\n\n## Overview\n\nThe SSH Temporal Key Management System provides automated generation, deployment, and cleanup of short-lived SSH keys\nfor secure server access. It eliminates the need for static SSH keys by generating keys on-demand with automatic expiration.\n\n## Features\n\n### Core Features\n\n- **Short-Lived Keys**: Keys expire automatically after a configurable TTL (default: 1 hour)\n- **Multiple Key Types**:\n - Dynamic Key Pairs (Ed25519)\n - Vault OTP (One-Time Password)\n - Vault CA-Signed Certificates\n- **Automatic Cleanup**: Background task removes expired keys from servers\n- **Audit Trail**: All key operations are logged\n- **REST API**: HTTP endpoints for integration\n- **Nushell CLI**: User-friendly command-line interface\n\n### Security Features\n\n- ✅ Ed25519 keys (modern, secure algorithm)\n- ✅ Automatic expiration and cleanup\n- ✅ Private keys never stored on disk (only in memory)\n- ✅ Vault integration for enterprise scenarios\n- ✅ SSH fingerprint tracking\n- ✅ Per-key audit logging\n\n## Architecture\n\n```{$detected_lang}\n┌─────────────────────────────────────────────────\n────────────┐\n│ SSH Key Manager │\n├─────────────────────────────────────────────────\n────────────┤\n│ │\n│ ┌──────────────┐ ┌──────────────┐ \n┌──────────────┐ │\n│ │ Key Generator│ │ Key Deployer │ │ Temporal │ │\n│ │ (Ed25519) │ │ (SSH Deploy) │ │ Manager │ │\n│ └──────────────┘ └──────────────┘ \n└──────────────┘ │\n│ │\n│ ┌──────────────┐ ┌──────────────┐ │\n│ │ Vault │ │ Authorized │ │\n│ │ SSH Engine │ │ Keys Manager │ │\n│ └──────────────┘ └──────────────┘ │\n│ │\n└─────────────────────────────────────────────────\n────────────┘\n │ │ │\n ▼ ▼ ▼\n REST API Nushell CLI Background Tasks\n```\n\n## Key Types\n\n### 1. Dynamic Key Pairs (Default)\n\nGenerated on-demand Ed25519 keys that are automatically deployed and cleaned up.\n\n**Use Case**: Quick SSH access without Vault infrastructure\n\n**Example**:\n\n```{$detected_lang}\nssh generate-key server.example.com --user root --ttl 30min\n```\n\n### 2. Vault OTP (One-Time Password)\n\nVault generates a one-time password for SSH authentication.\n\n**Use Case**: Single-use SSH access with centralized authentication\n\n**Requirements**: Vault with SSH secrets engine in OTP mode\n\n**Example**:\n\n```{$detected_lang}\nssh generate-key server.example.com --type otp --ip 192.168.1.100\n```\n\n### 3. Vault CA-Signed Certificates\n\nVault acts as SSH CA, signing user public keys with short TTL.\n\n**Use Case**: Enterprise scenarios with SSH CA infrastructure\n\n**Requirements**: Vault with SSH secrets engine in CA mode\n\n**Example**:\n\n```{$detected_lang}\nssh generate-key server.example.com --type ca --principal admin --ttl 1hr\n```\n\n## REST API Endpoints\n\nBase URL: `http://localhost:9090`\n\n### Generate SSH Key\n\n```{$detected_lang}\nPOST /api/v1/ssh/generate\n\n{\n "key_type": "dynamickeypair", // or "otp", "certificate"\n "user": "root",\n "target_server": "server.example.com",\n "ttl_seconds": 3600,\n "allowed_ip": "192.168.1.100", // optional, for OTP\n "principal": "admin" // optional, for CA\n}\n\nResponse:\n{\n "success": true,\n "data": {\n "id": "uuid",\n "key_type": "dynamickeypair",\n "public_key": "ssh-ed25519 AAAA...",\n "private_key": "-----BEGIN OPENSSH PRIVATE KEY-----...",\n "fingerprint": "SHA256:...",\n "user": "root",\n "target_server": "server.example.com",\n "created_at": "2024-01-01T00:00:00Z",\n "expires_at": "2024-01-01T01:00:00Z",\n "deployed": false\n }\n}\n```\n\n### Deploy SSH Key\n\n```{$detected_lang}\nPOST /api/v1/ssh/{key_id}/deploy\n\nResponse:\n{\n "success": true,\n "data": {\n "key_id": "uuid",\n "server": "server.example.com",\n "success": true,\n "deployed_at": "2024-01-01T00:00:00Z"\n }\n}\n```\n\n### List SSH Keys\n\n```{$detected_lang}\nGET /api/v1/ssh/keys\n\nResponse:\n{\n "success": true,\n "data": [\n {\n "id": "uuid",\n "key_type": "dynamickeypair",\n "user": "root",\n "target_server": "server.example.com",\n "expires_at": "2024-01-01T01:00:00Z",\n "deployed": true\n }\n ]\n}\n```\n\n### Revoke SSH Key\n\n```{$detected_lang}\nPOST /api/v1/ssh/{key_id}/revoke\n\nResponse:\n{\n "success": true,\n "data": "Key uuid revoked successfully"\n}\n```\n\n### Get SSH Key\n\n```{$detected_lang}\nGET /api/v1/ssh/{key_id}\n\nResponse:\n{\n "success": true,\n "data": {\n "id": "uuid",\n "key_type": "dynamickeypair",\n ...\n }\n}\n```\n\n### Cleanup Expired Keys\n\n```{$detected_lang}\nPOST /api/v1/ssh/cleanup\n\nResponse:\n{\n "success": true,\n "data": {\n "cleaned_count": 5,\n "cleaned_key_ids": ["uuid1", "uuid2", ...]\n }\n}\n```\n\n### Get Statistics\n\n```{$detected_lang}\nGET /api/v1/ssh/stats\n\nResponse:\n{\n "success": true,\n "data": {\n "total_generated": 42,\n "active_keys": 10,\n "expired_keys": 32,\n "keys_by_type": {\n "dynamic": 35,\n "otp": 5,\n "certificate": 2\n },\n "last_cleanup_count": 5,\n "last_cleanup_at": "2024-01-01T00:00:00Z"\n }\n}\n```\n\n## Nushell CLI Commands\n\n### Generate Key\n\n```{$detected_lang}\nssh generate-key [options]\n\nOptions:\n --user SSH user (default: root)\n --ttl Key lifetime (default: 1hr)\n --type Key type (default: dynamic)\n --ip
Allowed IP (OTP mode)\n --principal Principal (CA mode)\n\nExamples:\n ssh generate-key server.example.com\n ssh generate-key server.example.com --user deploy --ttl 30min\n ssh generate-key server.example.com --type ca --principal admin\n```\n\n### Deploy Key\n\n```{$detected_lang}\nssh deploy-key \n\nExample:\n ssh deploy-key abc-123-def-456\n```\n\n### List Keys\n\n```{$detected_lang}\nssh list-keys [--expired]\n\nExample:\n ssh list-keys\n ssh list-keys | where deployed == true\n```\n\n### Revoke Key\n\n```{$detected_lang}\nssh revoke-key \n\nExample:\n ssh revoke-key abc-123-def-456\n```\n\n### Connect with Auto-Generated Key\n\n```{$detected_lang}\nssh connect [options]\n\nOptions:\n --user SSH user (default: root)\n --ttl Key lifetime (default: 1hr)\n --type Key type (default: dynamic)\n --keep Keep key after disconnect\n\nExample:\n ssh connect server.example.com --user deploy\n```\n\nThis command:\n\n1. Generates a temporal SSH key\n2. Deploys it to the server\n3. Opens SSH connection\n4. Revokes the key after disconnect (unless --keep is used)\n\n### Show Statistics\n\n```{$detected_lang}\nssh stats\n\nExample output:\n SSH Key Statistics:\n Total generated: 42\n Active keys: 10\n Expired keys: 32\n\n Keys by type:\n dynamic: 35\n otp: 5\n certificate: 2\n\n Last cleanup: 2024-01-01T00:00:00Z\n Cleaned keys: 5\n```\n\n### Manual Cleanup\n\n```{$detected_lang}\nssh cleanup\n\nExample output:\n ✓ Cleaned up 5 expired keys\n Cleaned key IDs:\n - abc-123\n - def-456\n ...\n```\n\n## Configuration\n\n### Orchestrator Configuration\n\nAdd to orchestrator startup:\n\n```{$detected_lang}\nuse provisioning_orchestrator::{SshKeyManager, SshConfig};\n\n// Create SSH configuration\nlet ssh_config = SshConfig {\n vault_enabled: false, // Enable Vault integration\n vault_addr: None, // Vault address\n vault_token: None, // Vault token\n vault_mount_point: "ssh".to_string(),\n vault_mode: "ca".to_string(), // "ca" or "otp"\n default_ttl: Duration::hours(1),\n cleanup_interval: Duration::minutes(5),\n provisioning_key_path: Some("/path/to/provisioning/key".to_string()),\n};\n\n// Create SSH key manager\nlet ssh_manager = Arc::new(SshKeyManager::new(ssh_config).await?);\n\n// Start background cleanup task\nArc::clone(&ssh_manager).start_cleanup_task().await;\n```\n\n### Vault SSH Configuration\n\n#### OTP Mode\n\n```{$detected_lang}\n# Enable SSH secrets engine\nvault secrets enable ssh\n\n# Configure OTP role\nvault write ssh/roles/otp_key_role \\n key_type=otp \\n default_user=root \\n cidr_list=0.0.0.0/0\n```\n\n#### CA Mode\n\n```{$detected_lang}\n# Enable SSH secrets engine\nvault secrets enable ssh\n\n# Generate SSH CA\nvault write ssh/config/ca generate_signing_key=true\n\n# Configure CA role\nvault write ssh/roles/default \\n key_type=ca \\n ttl=1h \\n max_ttl=24h \\n allow_user_certificates=true \\n allowed_users="*" \\n default_extensions="permit-pty,permit-port-forwarding"\n\n# Get CA public key (add to servers' /etc/ssh/trusted-user-ca-keys.pem)\nvault read -field=public_key ssh/config/ca\n```\n\nServer configuration (`/etc/ssh/sshd_config`):\n\n```{$detected_lang}\nTrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem\n```\n\n## Deployment\n\n### Prerequisites\n\n1. **Orchestrator**: Running on port 8080\n2. **SSH Access**: Provisioning key for deploying to servers\n3. **Vault** (optional): For OTP or CA modes\n\n### Environment Variables\n\n```{$detected_lang}\n# Vault integration (optional)\nexport VAULT_ADDR=https://vault.example.com:8200\nexport VAULT_TOKEN=your-vault-token\n\n# Provisioning SSH key path\nexport PROVISIONING_SSH_KEY=/path/to/provisioning/key\n```\n\n### Integration with Workflows\n\nThe SSH key manager integrates with existing workflows:\n\n```{$detected_lang}\n# In server creation workflow\nlet ssh_key = (ssh generate-key $server --ttl 30min)\nssh deploy-key $ssh_key.id\n\n# Execute remote commands\nssh root@$server "install-kubernetes.sh"\n\n# Auto-revoke after workflow\nssh revoke-key $ssh_key.id\n```\n\n## Security Considerations\n\n1. **Private Key Exposure**: Private keys are only shown once during generation\n2. **Key Storage**: Keys stored in memory only, not on disk\n3. **Cleanup**: Automatic cleanup removes expired keys from servers\n4. **Audit Logging**: All operations logged for security audit\n5. **Vault Integration**: Optional Vault integration for enterprise security\n6. **TTL Limits**: Enforce maximum TTL to prevent long-lived keys\n\n## Troubleshooting\n\n### Key Deployment Fails\n\nCheck SSH connectivity:\n\n```{$detected_lang}\nssh -i /path/to/provisioning/key root@server.example.com\n```\n\nVerify SSH daemon is running:\n\n```{$detected_lang}\nsystemctl status sshd\n```\n\n### Cleanup Not Working\n\nCheck orchestrator logs:\n\n```{$detected_lang}\ntail -f ./data/orchestrator.log | grep SSH\n```\n\nManual cleanup:\n\n```{$detected_lang}\nssh cleanup\n```\n\n### Vault Integration Issues\n\nTest Vault connectivity:\n\n```{$detected_lang}\nvault status\nvault token lookup\n```\n\nCheck SSH secrets engine:\n\n```{$detected_lang}\nvault secrets list\nvault read ssh/config/ca\n```\n\n## Performance\n\n- **Key Generation**: <100ms (Ed25519)\n- **Key Deployment**: ~1s (depends on SSH latency)\n- **Cleanup Task**: Every 5 minutes (configurable)\n- **Concurrent Keys**: Unlimited (memory bound)\n\n## Future Enhancements\n\n- [ ] SSH certificate rotation\n- [ ] Integration with KMS for key encryption\n- [ ] WebSocket notifications for key expiration\n- [ ] Prometheus metrics export\n- [ ] SSH session recording\n- [ ] Role-based key generation policies\n\n## References\n\n- RFC 8709: Ed25519 and Ed448 Public Key Algorithms for SSH\n- Vault SSH Secrets Engine: \n- OpenSSH Certificate Authentication: +# SSH Temporal Key Management System + +## Overview + +The SSH Temporal Key Management System provides automated generation, deployment, and cleanup of short-lived SSH keys +for secure server access. It eliminates the need for static SSH keys by generating keys on-demand with automatic expiration. + +## Features + +### Core Features + +- **Short-Lived Keys**: Keys expire automatically after a configurable TTL (default: 1 hour) +- **Multiple Key Types**: + - Dynamic Key Pairs (Ed25519) + - Vault OTP (One-Time Password) + - Vault CA-Signed Certificates +- **Automatic Cleanup**: Background task removes expired keys from servers +- **Audit Trail**: All key operations are logged +- **REST API**: HTTP endpoints for integration +- **Nushell CLI**: User-friendly command-line interface + +### Security Features + +- ✅ Ed25519 keys (modern, secure algorithm) +- ✅ Automatic expiration and cleanup +- ✅ Private keys never stored on disk (only in memory) +- ✅ Vault integration for enterprise scenarios +- ✅ SSH fingerprint tracking +- ✅ Per-key audit logging + +## Architecture + +```bash +┌───────────────────────────────────────────────── +────────────┐ +│ SSH Key Manager │ +├───────────────────────────────────────────────── +────────────┤ +│ │ +│ ┌──────────────┐ ┌──────────────┐ +┌──────────────┐ │ +│ │ Key Generator│ │ Key Deployer │ │ Temporal │ │ +│ │ (Ed25519) │ │ (SSH Deploy) │ │ Manager │ │ +│ └──────────────┘ └──────────────┘ +└──────────────┘ │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ │ +│ │ Vault │ │ Authorized │ │ +│ │ SSH Engine │ │ Keys Manager │ │ +│ └──────────────┘ └──────────────┘ │ +│ │ +└───────────────────────────────────────────────── +────────────┘ + │ │ │ + ▼ ▼ ▼ + REST API Nushell CLI Background Tasks +``` + +## Key Types + +### 1. Dynamic Key Pairs (Default) + +Generated on-demand Ed25519 keys that are automatically deployed and cleaned up. + +**Use Case**: Quick SSH access without Vault infrastructure + +**Example**: + +```bash +ssh generate-key server.example.com --user root --ttl 30min +``` + +### 2. Vault OTP (One-Time Password) + +Vault generates a one-time password for SSH authentication. + +**Use Case**: Single-use SSH access with centralized authentication + +**Requirements**: Vault with SSH secrets engine in OTP mode + +**Example**: + +```bash +ssh generate-key server.example.com --type otp --ip 192.168.1.100 +``` + +### 3. Vault CA-Signed Certificates + +Vault acts as SSH CA, signing user public keys with short TTL. + +**Use Case**: Enterprise scenarios with SSH CA infrastructure + +**Requirements**: Vault with SSH secrets engine in CA mode + +**Example**: + +```bash +ssh generate-key server.example.com --type ca --principal admin --ttl 1hr +``` + +## REST API Endpoints + +Base URL: `http://localhost:9090` + +### Generate SSH Key + +```bash +POST /api/v1/ssh/generate + +{ + "key_type": "dynamickeypair", // or "otp", "certificate" + "user": "root", + "target_server": "server.example.com", + "ttl_seconds": 3600, + "allowed_ip": "192.168.1.100", // optional, for OTP + "principal": "admin" // optional, for CA +} + +Response: +{ + "success": true, + "data": { + "id": "uuid", + "key_type": "dynamickeypair", + "public_key": "ssh-ed25519 AAAA...", + "private_key": "-----BEGIN OPENSSH PRIVATE KEY-----...", + "fingerprint": "SHA256:...", + "user": "root", + "target_server": "server.example.com", + "created_at": "2024-01-01T00:00:00Z", + "expires_at": "2024-01-01T01:00:00Z", + "deployed": false + } +} +``` + +### Deploy SSH Key + +```bash +POST /api/v1/ssh/{key_id}/deploy + +Response: +{ + "success": true, + "data": { + "key_id": "uuid", + "server": "server.example.com", + "success": true, + "deployed_at": "2024-01-01T00:00:00Z" + } +} +``` + +### List SSH Keys + +```bash +GET /api/v1/ssh/keys + +Response: +{ + "success": true, + "data": [ + { + "id": "uuid", + "key_type": "dynamickeypair", + "user": "root", + "target_server": "server.example.com", + "expires_at": "2024-01-01T01:00:00Z", + "deployed": true + } + ] +} +``` + +### Revoke SSH Key + +```bash +POST /api/v1/ssh/{key_id}/revoke + +Response: +{ + "success": true, + "data": "Key uuid revoked successfully" +} +``` + +### Get SSH Key + +```bash +GET /api/v1/ssh/{key_id} + +Response: +{ + "success": true, + "data": { + "id": "uuid", + "key_type": "dynamickeypair", + ... + } +} +``` + +### Cleanup Expired Keys + +```bash +POST /api/v1/ssh/cleanup + +Response: +{ + "success": true, + "data": { + "cleaned_count": 5, + "cleaned_key_ids": ["uuid1", "uuid2", ...] + } +} +``` + +### Get Statistics + +```bash +GET /api/v1/ssh/stats + +Response: +{ + "success": true, + "data": { + "total_generated": 42, + "active_keys": 10, + "expired_keys": 32, + "keys_by_type": { + "dynamic": 35, + "otp": 5, + "certificate": 2 + }, + "last_cleanup_count": 5, + "last_cleanup_at": "2024-01-01T00:00:00Z" + } +} +``` + +## Nushell CLI Commands + +### Generate Key + +```bash +ssh generate-key [options] + +Options: + --user SSH user (default: root) + --ttl Key lifetime (default: 1hr) + --type Key type (default: dynamic) + --ip
Allowed IP (OTP mode) + --principal Principal (CA mode) + +Examples: + ssh generate-key server.example.com + ssh generate-key server.example.com --user deploy --ttl 30min + ssh generate-key server.example.com --type ca --principal admin +``` + +### Deploy Key + +```bash +ssh deploy-key + +Example: + ssh deploy-key abc-123-def-456 +``` + +### List Keys + +```bash +ssh list-keys [--expired] + +Example: + ssh list-keys + ssh list-keys | where deployed == true +``` + +### Revoke Key + +```bash +ssh revoke-key + +Example: + ssh revoke-key abc-123-def-456 +``` + +### Connect with Auto-Generated Key + +```bash +ssh connect [options] + +Options: + --user SSH user (default: root) + --ttl Key lifetime (default: 1hr) + --type Key type (default: dynamic) + --keep Keep key after disconnect + +Example: + ssh connect server.example.com --user deploy +``` + +This command: + +1. Generates a temporal SSH key +2. Deploys it to the server +3. Opens SSH connection +4. Revokes the key after disconnect (unless --keep is used) + +### Show Statistics + +```bash +ssh stats + +Example output: + SSH Key Statistics: + Total generated: 42 + Active keys: 10 + Expired keys: 32 + + Keys by type: + dynamic: 35 + otp: 5 + certificate: 2 + + Last cleanup: 2024-01-01T00:00:00Z + Cleaned keys: 5 +``` + +### Manual Cleanup + +```bash +ssh cleanup + +Example output: + ✓ Cleaned up 5 expired keys + Cleaned key IDs: + - abc-123 + - def-456 + ... +``` + +## Configuration + +### Orchestrator Configuration + +Add to orchestrator startup: + +```bash +use provisioning_orchestrator::{SshKeyManager, SshConfig}; + +// Create SSH configuration +let ssh_config = SshConfig { + vault_enabled: false, // Enable Vault integration + vault_addr: None, // Vault address + vault_token: None, // Vault token + vault_mount_point: "ssh".to_string(), + vault_mode: "ca".to_string(), // "ca" or "otp" + default_ttl: Duration::hours(1), + cleanup_interval: Duration::minutes(5), + provisioning_key_path: Some("/path/to/provisioning/key".to_string()), +}; + +// Create SSH key manager +let ssh_manager = Arc::new(SshKeyManager::new(ssh_config).await?); + +// Start background cleanup task +Arc::clone(&ssh_manager).start_cleanup_task().await; +``` + +### Vault SSH Configuration + +#### OTP Mode + +```bash +# Enable SSH secrets engine +vault secrets enable ssh + +# Configure OTP role +vault write ssh/roles/otp_key_role + key_type=otp + default_user=root + cidr_list=0.0.0.0/0 +``` + +#### CA Mode + +```bash +# Enable SSH secrets engine +vault secrets enable ssh + +# Generate SSH CA +vault write ssh/config/ca generate_signing_key=true + +# Configure CA role +vault write ssh/roles/default + key_type=ca + ttl=1h + max_ttl=24h + allow_user_certificates=true + allowed_users="*" + default_extensions="permit-pty,permit-port-forwarding" + +# Get CA public key (add to servers' /etc/ssh/trusted-user-ca-keys.pem) +vault read -field=public_key ssh/config/ca +``` + +Server configuration (`/etc/ssh/sshd_config`): + +```toml +TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem +``` + +## Deployment + +### Prerequisites + +1. **Orchestrator**: Running on port 8080 +2. **SSH Access**: Provisioning key for deploying to servers +3. **Vault** (optional): For OTP or CA modes + +### Environment Variables + +```bash +# Vault integration (optional) +export VAULT_ADDR=https://vault.example.com:8200 +export VAULT_TOKEN=your-vault-token + +# Provisioning SSH key path +export PROVISIONING_SSH_KEY=/path/to/provisioning/key +``` + +### Integration with Workflows + +The SSH key manager integrates with existing workflows: + +```bash +# In server creation workflow +let ssh_key = (ssh generate-key $server --ttl 30min) +ssh deploy-key $ssh_key.id + +# Execute remote commands +ssh root@$server "install-kubernetes.sh" + +# Auto-revoke after workflow +ssh revoke-key $ssh_key.id +``` + +## Security Considerations + +1. **Private Key Exposure**: Private keys are only shown once during generation +2. **Key Storage**: Keys stored in memory only, not on disk +3. **Cleanup**: Automatic cleanup removes expired keys from servers +4. **Audit Logging**: All operations logged for security audit +5. **Vault Integration**: Optional Vault integration for enterprise security +6. **TTL Limits**: Enforce maximum TTL to prevent long-lived keys + +## Troubleshooting + +### Key Deployment Fails + +Check SSH connectivity: + +```bash +ssh -i /path/to/provisioning/key root@server.example.com +``` + +Verify SSH daemon is running: + +```bash +systemctl status sshd +``` + +### Cleanup Not Working + +Check orchestrator logs: + +```bash +tail -f ./data/orchestrator.log | grep SSH +``` + +Manual cleanup: + +```bash +ssh cleanup +``` + +### Vault Integration Issues + +Test Vault connectivity: + +```bash +vault status +vault token lookup +``` + +Check SSH secrets engine: + +```bash +vault secrets list +vault read ssh/config/ca +``` + +## Performance + +- **Key Generation**: <100ms (Ed25519) +- **Key Deployment**: ~1s (depends on SSH latency) +- **Cleanup Task**: Every 5 minutes (configurable) +- **Concurrent Keys**: Unlimited (memory bound) + +## Future Enhancements + +- [ ] SSH certificate rotation +- [ ] Integration with KMS for key encryption +- [ ] WebSocket notifications for key expiration +- [ ] Prometheus metrics export +- [ ] SSH session recording +- [ ] Role-based key generation policies + +## References + +- RFC 8709: Ed25519 and Ed448 Public Key Algorithms for SSH +- Vault SSH Secrets Engine: +- OpenSSH Certificate Authentication: \ No newline at end of file diff --git a/crates/orchestrator/docs/storage-backends.md b/crates/orchestrator/docs/storage-backends.md index 02827f4..48c07a0 100644 --- a/crates/orchestrator/docs/storage-backends.md +++ b/crates/orchestrator/docs/storage-backends.md @@ -1 +1,385 @@ -# Storage Backends Guide\n\nThis document provides comprehensive guidance on the orchestrator's storage backend options, configuration, and migration between them.\n\n## Overview\n\nThe orchestrator supports three storage backends through a pluggable architecture:\n\n1. **Filesystem** - JSON file-based storage (default)\n2. **SurrealDB Embedded** - Local database with RocksDB engine\n3. **SurrealDB Server** - Remote SurrealDB server connection\n\nAll backends implement the same `TaskStorage` trait, ensuring consistent behavior and seamless migration.\n\n## Backend Comparison\n\n| Feature | Filesystem | SurrealDB Embedded | SurrealDB Server |\n| --------- | ------------ | ------------------- | ------------------ |\n| **Setup Complexity** | Minimal | Low | Medium |\n| **External Dependencies** | None | None | SurrealDB Server |\n| **Storage Format** | JSON Files | RocksDB | Remote DB |\n| **ACID Transactions** | No | Yes | Yes |\n| **Authentication/RBAC** | Basic | Advanced | Advanced |\n| **Real-time Subscriptions** | No | Yes | Yes |\n| **Audit Logging** | Manual | Automatic | Automatic |\n| **Metrics Collection** | Basic | Advanced | Advanced |\n| **Task Dependencies** | Simple | Graph-based | Graph-based |\n| **Horizontal Scaling** | No | No | Yes |\n| **Backup/Recovery** | File Copy | Database Backup | Server Backup |\n| **Performance** | Good | Excellent | Variable |\n| **Memory Usage** | Low | Medium | Low |\n| **Disk Usage** | Medium | Optimized | Minimal |\n\n## 1. Filesystem Backend\n\n### Overview\n\nThe default storage backend using JSON files for task persistence. Ideal for development and simple deployments.\n\n### Configuration\n\n```{$detected_lang}\n# Default configuration\n./orchestrator --storage-type filesystem --data-dir ./data\n\n# Custom data directory\n./orchestrator --storage-type filesystem --data-dir /var/lib/orchestrator\n```\n\n### File Structure\n\n```{$detected_lang}\ndata/\n└── queue.rkvs/\n ├── tasks/\n │ ├── uuid1.json # Individual task records\n │ ├── uuid2.json\n │ └── ...\n └── queue/\n ├── uuid1.json # Queue entries with priority\n ├── uuid2.json\n └── ...\n```\n\n### Features\n\n- ✅ **Simple Setup**: No external dependencies\n- ✅ **Transparency**: Human-readable JSON files\n- ✅ **Backup**: Standard file system tools\n- ✅ **Debugging**: Direct file inspection\n- ❌ **ACID**: No transaction guarantees\n- ❌ **Concurrency**: Basic file locking\n- ❌ **Advanced Features**: Limited auth/audit\n\n### Best Use Cases\n\n- Development environments\n- Single-instance deployments\n- Simple task orchestration\n- Environments with strict dependency requirements\n\n## 2. SurrealDB Embedded\n\n### Overview\n\nLocal SurrealDB database using RocksDB storage engine. Provides advanced database features without external dependencies.\n\n### Configuration\n\n```{$detected_lang}\n# Build with SurrealDB support\ncargo build --features surrealdb\n\n# Run with embedded SurrealDB\n./orchestrator --storage-type surrealdb-embedded --data-dir ./data\n```\n\n### Database Schema\n\n- **tasks**: Main task records with full metadata\n- **task_queue**: Priority queue with scheduling info\n- **users**: Authentication and RBAC\n- **audit_log**: Complete operation history\n- **metrics**: Performance and usage statistics\n- **task_events**: Real-time event stream\n\n### Features\n\n- ✅ **ACID Transactions**: Reliable data consistency\n- ✅ **Advanced Queries**: SQL-like syntax with graph support\n- ✅ **Real-time Events**: Live query subscriptions\n- ✅ **Built-in Auth**: User management and RBAC\n- ✅ **Audit Logging**: Automatic operation tracking\n- ✅ **No External Deps**: Self-contained database\n- ❌ **Horizontal Scaling**: Single-node only\n\n### Configuration Options\n\n```{$detected_lang}\n# Custom database location\n./orchestrator --storage-type surrealdb-embedded \\n --data-dir /var/lib/orchestrator/db\n\n# With specific namespace/database\n./orchestrator --storage-type surrealdb-embedded \\n --data-dir ./data \\n --surrealdb-namespace production \\n --surrealdb-database orchestrator\n```\n\n### Best Use Cases\n\n- Production single-node deployments\n- Applications requiring ACID guarantees\n- Advanced querying and analytics\n- Real-time monitoring requirements\n- Audit logging compliance\n\n## 3. SurrealDB Server\n\n### Overview\n\nRemote SurrealDB server connection providing full distributed database capabilities with horizontal scaling.\n\n### Prerequisites\n\n1. **SurrealDB Server**: Running instance accessible via network\n2. **Authentication**: Valid credentials for database access\n3. **Network**: Reliable connectivity to SurrealDB server\n\n### SurrealDB Server Setup\n\n```{$detected_lang}\n# Install SurrealDB\ncurl -sSf https://install.surrealdb.com | sh\n\n# Start server\nsurreal start --log trace --user root --pass root memory\n\n# Or with file storage\nsurreal start --log trace --user root --pass root file:orchestrator.db\n\n# Or with TiKV (distributed)\nsurreal start --log trace --user root --pass root tikv://localhost:2379\n```\n\n### Configuration\n\n```{$detected_lang}\n# Basic server connection\n./orchestrator --storage-type surrealdb-server \\n --surrealdb-url ws://localhost:8000 \\n --surrealdb-username admin \\n --surrealdb-password secret\n\n# Production configuration\n./orchestrator --storage-type surrealdb-server \\n --surrealdb-url wss://surreal.production.com:8000 \\n --surrealdb-namespace prod \\n --surrealdb-database orchestrator \\n --surrealdb-username orchestrator-service \\n --surrealdb-password "$SURREALDB_PASSWORD"\n```\n\n### Features\n\n- ✅ **Distributed**: Multi-node clustering support\n- ✅ **Horizontal Scaling**: Handle massive workloads\n- ✅ **Multi-tenancy**: Namespace and database isolation\n- ✅ **Real-time Collaboration**: Multiple orchestrator instances\n- ✅ **Advanced Security**: Enterprise authentication\n- ✅ **High Availability**: Fault-tolerant deployments\n- ❌ **Complexity**: Requires server management\n- ❌ **Network Dependency**: Requires reliable connectivity\n\n### Best Use Cases\n\n- Distributed production deployments\n- Multiple orchestrator instances\n- High availability requirements\n- Large-scale task orchestration\n- Multi-tenant environments\n\n## Migration Between Backends\n\n### Migration Tool\n\nUse the migration script to move data between any backend combination:\n\n```{$detected_lang}\n# Interactive migration wizard\n./scripts/migrate-storage.nu --interactive\n\n# Direct migration examples\n./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded \\n --source-dir ./data --target-dir ./surrealdb-data\n\n./scripts/migrate-storage.nu --from surrealdb-embedded --to surrealdb-server \\n --source-dir ./data \\n --surrealdb-url ws://localhost:8000 \\n --username admin --password secret\n\n# Validation and dry-run\n./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-embedded\n./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded --dry-run\n```\n\n### Migration Features\n\n- **Data Integrity**: Complete validation before and after migration\n- **Progress Tracking**: Real-time progress with throughput metrics\n- **Rollback Support**: Automatic rollback on failures\n- **Selective Migration**: Filter by task status, date range, etc.\n- **Batch Processing**: Configurable batch sizes for performance\n\n### Migration Scenarios\n\n#### Development to Production\n\n```{$detected_lang}\n# Migrate from filesystem (dev) to SurrealDB embedded (production)\n./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded \\n --source-dir ./dev-data --target-dir ./prod-data \\n --batch-size 100 --verify\n```\n\n#### Scaling Up\n\n```{$detected_lang}\n# Migrate from embedded to server for distributed setup\n./scripts/migrate-storage.nu --from surrealdb-embedded --to surrealdb-server \\n --source-dir ./data \\n --surrealdb-url ws://production-surreal:8000 \\n --username orchestrator --password "$PROD_PASSWORD" \\n --namespace production --database main\n```\n\n#### Disaster Recovery\n\n```{$detected_lang}\n# Migrate from server back to filesystem for emergency backup\n./scripts/migrate-storage.nu --from surrealdb-server --to filesystem \\n --surrealdb-url ws://failing-server:8000 \\n --username admin --password "$PASSWORD" \\n --target-dir ./emergency-backup\n```\n\n## Performance Considerations\n\n### Filesystem\n\n- **Strengths**: Low memory usage, simple debugging\n- **Limitations**: File I/O bottlenecks, no concurrent writes\n- **Optimization**: Fast SSD, regular cleanup of old tasks\n\n### SurrealDB Embedded\n\n- **Strengths**: Excellent single-node performance, ACID guarantees\n- **Limitations**: Memory usage scales with data size\n- **Optimization**: Adequate RAM, SSD storage, regular compaction\n\n### SurrealDB Server\n\n- **Strengths**: Horizontal scaling, shared state\n- **Limitations**: Network latency, server dependency\n- **Optimization**: Low-latency network, connection pooling, server tuning\n\n## Security Considerations\n\n### Filesystem\n\n- **File Permissions**: Restrict access to data directory\n- **Backup Security**: Encrypt backup files\n- **Network**: No network exposure\n\n### SurrealDB Embedded\n\n- **File Permissions**: Secure database files\n- **Encryption**: Database-level encryption available\n- **Access Control**: Built-in user management\n\n### SurrealDB Server\n\n- **Network Security**: Use TLS/WSS connections\n- **Authentication**: Strong passwords, regular rotation\n- **Authorization**: Role-based access control\n- **Audit**: Complete operation logging\n\n## Troubleshooting\n\n### Common Issues\n\n#### Filesystem Backend\n\n```{$detected_lang}\n# Permission issues\nsudo chown -R $USER:$USER ./data\nchmod -R 755 ./data\n\n# Corrupted JSON files\nrm ./data/queue.rkvs/tasks/corrupted-file.json\n```\n\n#### SurrealDB Embedded\n\n```{$detected_lang}\n# Database corruption\nrm -rf ./data/orchestrator.db\n# Restore from backup or re-initialize\n\n# Permission issues\nsudo chown -R $USER:$USER ./data\n```\n\n#### SurrealDB Server\n\n```{$detected_lang}\n# Connection issues\ntelnet surreal-server 8000\n# Check server status and network connectivity\n\n# Authentication failures\n# Verify credentials and user permissions\n```\n\n### Debugging Commands\n\n```{$detected_lang}\n# List available storage types\n./orchestrator --help | grep storage-type\n\n# Validate configuration\n./orchestrator --storage-type filesystem --data-dir ./data --dry-run\n\n# Test migration\n./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-embedded\n\n# Monitor migration progress\n./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded --verbose\n```\n\n## Recommendations\n\n### Development\n\n- **Use**: Filesystem backend\n- **Rationale**: Simple setup, easy debugging, no external dependencies\n\n### Single-Node Production\n\n- **Use**: SurrealDB Embedded\n- **Rationale**: ACID guarantees, advanced features, no external dependencies\n\n### Distributed Production\n\n- **Use**: SurrealDB Server\n- **Rationale**: Horizontal scaling, high availability, multi-instance support\n\n### Migration Path\n\n1. **Start**: Filesystem (development)\n2. **Scale**: SurrealDB Embedded (single-node production)\n3. **Distribute**: SurrealDB Server (multi-node production)\n\nThis progressive approach allows teams to start simple and scale as requirements grow, with seamless migration between each stage. +# Storage Backends Guide + +This document provides comprehensive guidance on the orchestrator's storage backend options, configuration, and migration between them. + +## Overview + +The orchestrator supports three storage backends through a pluggable architecture: + +1. **Filesystem** - JSON file-based storage (default) +2. **SurrealDB Embedded** - Local database with RocksDB engine +3. **SurrealDB Server** - Remote SurrealDB server connection + +All backends implement the same `TaskStorage` trait, ensuring consistent behavior and seamless migration. + +## Backend Comparison + +| Feature | Filesystem | SurrealDB Embedded | SurrealDB Server | +| --------- | ------------ | ------------------- | ------------------ | +| **Setup Complexity** | Minimal | Low | Medium | +| **External Dependencies** | None | None | SurrealDB Server | +| **Storage Format** | JSON Files | RocksDB | Remote DB | +| **ACID Transactions** | No | Yes | Yes | +| **Authentication/RBAC** | Basic | Advanced | Advanced | +| **Real-time Subscriptions** | No | Yes | Yes | +| **Audit Logging** | Manual | Automatic | Automatic | +| **Metrics Collection** | Basic | Advanced | Advanced | +| **Task Dependencies** | Simple | Graph-based | Graph-based | +| **Horizontal Scaling** | No | No | Yes | +| **Backup/Recovery** | File Copy | Database Backup | Server Backup | +| **Performance** | Good | Excellent | Variable | +| **Memory Usage** | Low | Medium | Low | +| **Disk Usage** | Medium | Optimized | Minimal | + +## 1. Filesystem Backend + +### Overview + +The default storage backend using JSON files for task persistence. Ideal for development and simple deployments. + +### Configuration + +```toml +# Default configuration +./orchestrator --storage-type filesystem --data-dir ./data + +# Custom data directory +./orchestrator --storage-type filesystem --data-dir /var/lib/orchestrator +``` + +### File Structure + +```bash +data/ +└── queue.rkvs/ + ├── tasks/ + │ ├── uuid1.json # Individual task records + │ ├── uuid2.json + │ └── ... + └── queue/ + ├── uuid1.json # Queue entries with priority + ├── uuid2.json + └── ... +``` + +### Features + +- ✅ **Simple Setup**: No external dependencies +- ✅ **Transparency**: Human-readable JSON files +- ✅ **Backup**: Standard file system tools +- ✅ **Debugging**: Direct file inspection +- ❌ **ACID**: No transaction guarantees +- ❌ **Concurrency**: Basic file locking +- ❌ **Advanced Features**: Limited auth/audit + +### Best Use Cases + +- Development environments +- Single-instance deployments +- Simple task orchestration +- Environments with strict dependency requirements + +## 2. SurrealDB Embedded + +### Overview + +Local SurrealDB database using RocksDB storage engine. Provides advanced database features without external dependencies. + +### Configuration + +```toml +# Build with SurrealDB support +cargo build --features surrealdb + +# Run with embedded SurrealDB +./orchestrator --storage-type surrealdb-embedded --data-dir ./data +``` + +### Database Schema + +- **tasks**: Main task records with full metadata +- **task_queue**: Priority queue with scheduling info +- **users**: Authentication and RBAC +- **audit_log**: Complete operation history +- **metrics**: Performance and usage statistics +- **task_events**: Real-time event stream + +### Features + +- ✅ **ACID Transactions**: Reliable data consistency +- ✅ **Advanced Queries**: SQL-like syntax with graph support +- ✅ **Real-time Events**: Live query subscriptions +- ✅ **Built-in Auth**: User management and RBAC +- ✅ **Audit Logging**: Automatic operation tracking +- ✅ **No External Deps**: Self-contained database +- ❌ **Horizontal Scaling**: Single-node only + +### Configuration Options + +```toml +# Custom database location +./orchestrator --storage-type surrealdb-embedded + --data-dir /var/lib/orchestrator/db + +# With specific namespace/database +./orchestrator --storage-type surrealdb-embedded + --data-dir ./data + --surrealdb-namespace production + --surrealdb-database orchestrator +``` + +### Best Use Cases + +- Production single-node deployments +- Applications requiring ACID guarantees +- Advanced querying and analytics +- Real-time monitoring requirements +- Audit logging compliance + +## 3. SurrealDB Server + +### Overview + +Remote SurrealDB server connection providing full distributed database capabilities with horizontal scaling. + +### Prerequisites + +1. **SurrealDB Server**: Running instance accessible via network +2. **Authentication**: Valid credentials for database access +3. **Network**: Reliable connectivity to SurrealDB server + +### SurrealDB Server Setup + +```bash +# Install SurrealDB +curl -sSf https://install.surrealdb.com | sh + +# Start server +surreal start --log trace --user root --pass root memory + +# Or with file storage +surreal start --log trace --user root --pass root file:orchestrator.db + +# Or with TiKV (distributed) +surreal start --log trace --user root --pass root tikv://localhost:2379 +``` + +### Configuration + +```toml +# Basic server connection +./orchestrator --storage-type surrealdb-server + --surrealdb-url ws://localhost:8000 + --surrealdb-username admin + --surrealdb-password secret + +# Production configuration +./orchestrator --storage-type surrealdb-server + --surrealdb-url wss://surreal.production.com:8000 + --surrealdb-namespace prod + --surrealdb-database orchestrator + --surrealdb-username orchestrator-service + --surrealdb-password "$SURREALDB_PASSWORD" +``` + +### Features + +- ✅ **Distributed**: Multi-node clustering support +- ✅ **Horizontal Scaling**: Handle massive workloads +- ✅ **Multi-tenancy**: Namespace and database isolation +- ✅ **Real-time Collaboration**: Multiple orchestrator instances +- ✅ **Advanced Security**: Enterprise authentication +- ✅ **High Availability**: Fault-tolerant deployments +- ❌ **Complexity**: Requires server management +- ❌ **Network Dependency**: Requires reliable connectivity + +### Best Use Cases + +- Distributed production deployments +- Multiple orchestrator instances +- High availability requirements +- Large-scale task orchestration +- Multi-tenant environments + +## Migration Between Backends + +### Migration Tool + +Use the migration script to move data between any backend combination: + +```bash +# Interactive migration wizard +./scripts/migrate-storage.nu --interactive + +# Direct migration examples +./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded + --source-dir ./data --target-dir ./surrealdb-data + +./scripts/migrate-storage.nu --from surrealdb-embedded --to surrealdb-server + --source-dir ./data + --surrealdb-url ws://localhost:8000 + --username admin --password secret + +# Validation and dry-run +./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-embedded +./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded --dry-run +``` + +### Migration Features + +- **Data Integrity**: Complete validation before and after migration +- **Progress Tracking**: Real-time progress with throughput metrics +- **Rollback Support**: Automatic rollback on failures +- **Selective Migration**: Filter by task status, date range, etc. +- **Batch Processing**: Configurable batch sizes for performance + +### Migration Scenarios + +#### Development to Production + +```bash +# Migrate from filesystem (dev) to SurrealDB embedded (production) +./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded + --source-dir ./dev-data --target-dir ./prod-data + --batch-size 100 --verify +``` + +#### Scaling Up + +```bash +# Migrate from embedded to server for distributed setup +./scripts/migrate-storage.nu --from surrealdb-embedded --to surrealdb-server + --source-dir ./data + --surrealdb-url ws://production-surreal:8000 + --username orchestrator --password "$PROD_PASSWORD" + --namespace production --database main +``` + +#### Disaster Recovery + +```bash +# Migrate from server back to filesystem for emergency backup +./scripts/migrate-storage.nu --from surrealdb-server --to filesystem + --surrealdb-url ws://failing-server:8000 + --username admin --password "$PASSWORD" + --target-dir ./emergency-backup +``` + +## Performance Considerations + +### Filesystem + +- **Strengths**: Low memory usage, simple debugging +- **Limitations**: File I/O bottlenecks, no concurrent writes +- **Optimization**: Fast SSD, regular cleanup of old tasks + +### SurrealDB Embedded + +- **Strengths**: Excellent single-node performance, ACID guarantees +- **Limitations**: Memory usage scales with data size +- **Optimization**: Adequate RAM, SSD storage, regular compaction + +### SurrealDB Server + +- **Strengths**: Horizontal scaling, shared state +- **Limitations**: Network latency, server dependency +- **Optimization**: Low-latency network, connection pooling, server tuning + +## Security Considerations + +### Filesystem + +- **File Permissions**: Restrict access to data directory +- **Backup Security**: Encrypt backup files +- **Network**: No network exposure + +### SurrealDB Embedded + +- **File Permissions**: Secure database files +- **Encryption**: Database-level encryption available +- **Access Control**: Built-in user management + +### SurrealDB Server + +- **Network Security**: Use TLS/WSS connections +- **Authentication**: Strong passwords, regular rotation +- **Authorization**: Role-based access control +- **Audit**: Complete operation logging + +## Troubleshooting + +### Common Issues + +#### Filesystem Backend + +```bash +# Permission issues +sudo chown -R $USER:$USER ./data +chmod -R 755 ./data + +# Corrupted JSON files +rm ./data/queue.rkvs/tasks/corrupted-file.json +``` + +#### SurrealDB Embedded + +```bash +# Database corruption +rm -rf ./data/orchestrator.db +# Restore from backup or re-initialize + +# Permission issues +sudo chown -R $USER:$USER ./data +``` + +#### SurrealDB Server + +```bash +# Connection issues +telnet surreal-server 8000 +# Check server status and network connectivity + +# Authentication failures +# Verify credentials and user permissions +``` + +### Debugging Commands + +```bash +# List available storage types +./orchestrator --help | grep storage-type + +# Validate configuration +./orchestrator --storage-type filesystem --data-dir ./data --dry-run + +# Test migration +./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-embedded + +# Monitor migration progress +./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded --verbose +``` + +## Recommendations + +### Development + +- **Use**: Filesystem backend +- **Rationale**: Simple setup, easy debugging, no external dependencies + +### Single-Node Production + +- **Use**: SurrealDB Embedded +- **Rationale**: ACID guarantees, advanced features, no external dependencies + +### Distributed Production + +- **Use**: SurrealDB Server +- **Rationale**: Horizontal scaling, high availability, multi-instance support + +### Migration Path + +1. **Start**: Filesystem (development) +2. **Scale**: SurrealDB Embedded (single-node production) +3. **Distribute**: SurrealDB Server (multi-node production) + +This progressive approach allows teams to start simple and scale as requirements grow, with seamless migration between each stage. \ No newline at end of file diff --git a/crates/orchestrator/wrks/readme-testing.md b/crates/orchestrator/wrks/readme-testing.md index 5170ff6..6f096b6 100644 --- a/crates/orchestrator/wrks/readme-testing.md +++ b/crates/orchestrator/wrks/readme-testing.md @@ -1 +1,392 @@ -# Testing Guide for Multi-Storage Orchestrator\n\nThis document provides comprehensive guidance for testing the multi-storage orchestrator system,\nincluding unit tests, integration tests, benchmarks, and performance analysis.\n\n## Overview\n\nThe orchestrator uses a multi-tiered testing approach:\n\n1. **Unit Tests**: Test individual components in isolation\n2. **Integration Tests**: Test complete workflows across storage backends\n3. **Migration Tests**: Validate data migration between backends\n4. **Factory Tests**: Test configuration and backend selection\n5. **Benchmarks**: Performance testing and regression detection\n\n## Test Structure\n\n```{$detected_lang}\ntests/\n├── helpers/mod.rs # Test utilities and mock implementations\n├── storage_integration.rs # Cross-backend integration tests\n├── migration_tests.rs # Migration validation tests\n└── factory_tests.rs # Factory and configuration tests\n\nbenches/\n├── storage_benchmarks.rs # Storage performance benchmarks\n└── migration_benchmarks.rs # Migration performance benchmarks\n\nsrc/\n├── storage/ # Unit tests embedded in modules\n├── migration/tests.rs # Migration unit tests\n└── main.rs # Application integration tests\n```\n\n## Running Tests\n\n### Basic Test Commands\n\n```{$detected_lang}\n# Run all tests (filesystem backend only)\ncargo test\n\n# Run all tests with SurrealDB backends\ncargo test --features surrealdb\n\n# Run specific test suites\ncargo test --test storage_integration\ncargo test --test migration_tests\ncargo test --test factory_tests\n\n# Run unit tests only\ncargo test --lib\n```\n\n### Using Cargo Aliases\n\nThe project includes convenient aliases (defined in `.cargo/config.toml`):\n\n```{$detected_lang}\n# Test all backends with all features\ncargo test-all\n\n# Test only filesystem backend\ncargo test-fs\n\n# Test with SurrealDB features\ncargo test-surrealdb\n\n# Test specific areas\ncargo test-integration\ncargo test-migration\ncargo test-factory\ncargo test-unit\n```\n\n## Test Features and Backends\n\n### Backend Support\n\n- **Filesystem**: Always available, no additional dependencies\n- **SurrealDB Embedded**: Requires `--features surrealdb`\n- **SurrealDB Server**: Requires `--features surrealdb`\n\n### Feature-Gated Tests\n\nTests automatically adapt to available features:\n\n```{$detected_lang}\n#[cfg(feature = "surrealdb")]\n#[tokio::test]\nasync fn test_surrealdb_specific_feature() {\n // This test only runs when SurrealDB feature is enabled\n}\n```\n\n## Integration Tests\n\n### Storage Integration Tests\n\nLocation: `tests/storage_integration.rs`\n\nThese tests verify consistent behavior across all storage backends:\n\n```{$detected_lang}\n// Example: Test runs against all available backends\ntest_all_backends!(test_basic_crud_operations, |storage, gen| async move {\n let task = gen.workflow_task();\n storage.enqueue(task.clone(), 1).await?;\n // ... test implementation\n Ok(())\n});\n```\n\n**Key Test Scenarios:**\n\n- Basic CRUD operations\n- Queue management and priorities\n- Task status updates\n- Batch operations\n- Search and filtering\n- Concurrent operations\n- Error handling\n- Performance characteristics\n\n### Migration Tests\n\nLocation: `tests/migration_tests.rs`\n\nValidates data migration between all backend combinations:\n\n```{$detected_lang}\n# Run migration tests\ncargo test --features surrealdb --test migration_tests\n\n# Test specific migration scenarios\ncargo test --features surrealdb test_filesystem_to_embedded_migration\ncargo test --features surrealdb test_large_dataset_migration_performance\n```\n\n**Migration Test Coverage:**\n\n- Data integrity verification\n- Rollback functionality\n- Progress tracking\n- Error recovery\n- Performance scaling\n- Filtering and batch operations\n\n### Factory Tests\n\nLocation: `tests/factory_tests.rs`\n\nTests configuration validation and backend selection:\n\n```{$detected_lang}\n# Run factory tests\ncargo test --test factory_tests\n\n# Test configuration validation\ncargo test test_storage_config_validation_failures\n```\n\n## Benchmarks\n\n### Storage Benchmarks\n\nLocation: `benches/storage_benchmarks.rs`\n\n```{$detected_lang}\n# Run all storage benchmarks\ncargo bench-storage\n\n# Run specific backend benchmarks\ncargo bench-fs\ncargo bench-surrealdb # Requires --features surrealdb\n\n# Run specific benchmark categories\ncargo bench -- single_enqueue\ncargo bench -- batch_operations\ncargo bench -- concurrent_operations\n```\n\n**Benchmark Categories:**\n\n- Single operations (enqueue/dequeue)\n- Batch operations\n- Search and retrieval\n- Concurrent operations\n- Cleanup operations\n\n### Migration Benchmarks\n\nLocation: `benches/migration_benchmarks.rs`\n\n```{$detected_lang}\n# Run migration benchmarks\ncargo bench-migration\n\n# Test migration performance\ncargo bench -- basic_migration\ncargo bench -- migration_batch_sizes\n```\n\n**Migration Benchmarks:**\n\n- Basic migration throughput\n- Batch size optimization\n- Verification overhead\n- Progress tracking overhead\n- Dry run performance\n\n## Test Helpers and Utilities\n\n### TestDataGenerator\n\nProvides consistent test data across all tests:\n\n```{$detected_lang}\nuse crate::helpers::TestDataGenerator;\n\nlet gen = TestDataGenerator::new();\nlet task = gen.workflow_task();\nlet batch = gen.workflow_tasks_batch(10);\n```\n\n### StorageTestRunner\n\nRuns tests against all available storage backends:\n\n```{$detected_lang}\nuse crate::helpers::StorageTestRunner;\n\nlet mut runner = StorageTestRunner::new();\nrunner.run_against_all_backends(test_function).await;\n```\n\n### MockStorage\n\nMock implementation for testing migration scenarios:\n\n```{$detected_lang}\nuse crate::helpers::MockStorage;\n\nlet mock = MockStorage::new();\nmock.set_health(false); // Simulate failure\n```\n\n## Performance Testing\n\n### Benchmark Configuration\n\nBenchmarks are configured with:\n\n- Small sample sizes for expensive operations\n- Throughput measurement for batch operations\n- Memory usage tracking\n- Concurrent operation testing\n\n### Performance Targets\n\n**Storage Operations:**\n\n- Single enqueue: < 1ms average\n- Batch enqueue (100 tasks): < 100ms average\n- Task retrieval: < 0.5ms average\n- Search operations: < 50ms average\n\n**Migration Operations:**\n\n- Small dataset (100 tasks): < 5 seconds\n- Large dataset (1000 tasks): < 30 seconds\n- Throughput: > 10 tasks/second\n\n## Continuous Integration\n\n### CI Test Matrix\n\n```{$detected_lang}\n# Example CI configuration\nstrategy:\n matrix:\n features:\n - "" # Filesystem only\n - "surrealdb" # All backends\n rust:\n - stable\n - beta\n```\n\n### Test Commands for CI\n\n```{$detected_lang}\n# Basic functionality tests\ncargo test --no-default-features\ncargo test --all-features\n\n# Documentation tests\ncargo test --doc --all-features\n\n# Benchmark regression tests\ncargo bench --all-features -- --test\n```\n\n## Debugging and Troubleshooting\n\n### Verbose Test Output\n\n```{$detected_lang}\n# Enable detailed logging\nRUST_LOG=debug cargo test --features surrealdb\n\n# Show test output\ncargo test -- --nocapture\n\n# Run single test with full output\ncargo test test_name -- --exact --nocapture\n```\n\n### Common Issues\n\n1. **SurrealDB tests failing**: Ensure `--features surrealdb` is specified\n2. **Temporary directory errors**: Tests clean up automatically, but manual cleanup may be needed\n3. **Port conflicts**: Tests use ephemeral ports, but conflicts can occur\n4. **Timing issues**: Some tests use sleeps for async operations\n\n### Test Data Isolation\n\n- Each test uses unique temporary directories\n- Mock storage is reset between tests\n- Concurrent tests use separate data spaces\n- Cleanup is automatic via `Drop` implementations\n\n## Coverage Analysis\n\n```{$detected_lang}\n# Generate coverage report\ncargo install cargo-tarpaulin\ncargo test-coverage\n\n# View coverage report\nopen target/tarpaulin-report.html\n```\n\n## Performance Profiling\n\n```{$detected_lang}\n# Profile storage operations\ncargo bench --bench storage_benchmarks -- --profile-time=10\n\n# Profile migration operations\ncargo bench --bench migration_benchmarks -- --profile-time=10\n\n# Generate flame graphs\ncargo install flamegraph\ncargo flamegraph --bench storage_benchmarks\n```\n\n## Best Practices\n\n### Writing Tests\n\n1. **Use descriptive test names** that explain what is being tested\n2. **Test error conditions** as well as success paths\n3. **Use feature gates** for backend-specific tests\n4. **Clean up resources** using RAII patterns\n5. **Test concurrency** where applicable\n\n### Test Data\n\n1. **Use test generators** for consistent data\n2. **Test with realistic data sizes**\n3. **Include edge cases** (empty data, large data, malformed data)\n4. **Use deterministic data** where possible\n\n### Performance Testing\n\n1. **Set appropriate baselines** for performance regression\n2. **Test with various data sizes** to understand scaling\n3. **Include warmup iterations** for accurate measurements\n4. **Document performance expectations** in code comments\n\n## Contributing\n\nWhen adding new features:\n\n1. Add unit tests for new components\n2. Update integration tests for new storage methods\n3. Add migration tests for new backends\n4. Update benchmarks for performance-critical code\n5. Document any new test utilities\n\nFor more information on the storage architecture and API, see the main project documentation. +# Testing Guide for Multi-Storage Orchestrator + +This document provides comprehensive guidance for testing the multi-storage orchestrator system, +including unit tests, integration tests, benchmarks, and performance analysis. + +## Overview + +The orchestrator uses a multi-tiered testing approach: + +1. **Unit Tests**: Test individual components in isolation +2. **Integration Tests**: Test complete workflows across storage backends +3. **Migration Tests**: Validate data migration between backends +4. **Factory Tests**: Test configuration and backend selection +5. **Benchmarks**: Performance testing and regression detection + +## Test Structure + +```bash +tests/ +├── helpers/mod.rs # Test utilities and mock implementations +├── storage_integration.rs # Cross-backend integration tests +├── migration_tests.rs # Migration validation tests +└── factory_tests.rs # Factory and configuration tests + +benches/ +├── storage_benchmarks.rs # Storage performance benchmarks +└── migration_benchmarks.rs # Migration performance benchmarks + +src/ +├── storage/ # Unit tests embedded in modules +├── migration/tests.rs # Migration unit tests +└── main.rs # Application integration tests +``` + +## Running Tests + +### Basic Test Commands + +```bash +# Run all tests (filesystem backend only) +cargo test + +# Run all tests with SurrealDB backends +cargo test --features surrealdb + +# Run specific test suites +cargo test --test storage_integration +cargo test --test migration_tests +cargo test --test factory_tests + +# Run unit tests only +cargo test --lib +``` + +### Using Cargo Aliases + +The project includes convenient aliases (defined in `.cargo/config.toml`): + +```toml +# Test all backends with all features +cargo test-all + +# Test only filesystem backend +cargo test-fs + +# Test with SurrealDB features +cargo test-surrealdb + +# Test specific areas +cargo test-integration +cargo test-migration +cargo test-factory +cargo test-unit +``` + +## Test Features and Backends + +### Backend Support + +- **Filesystem**: Always available, no additional dependencies +- **SurrealDB Embedded**: Requires `--features surrealdb` +- **SurrealDB Server**: Requires `--features surrealdb` + +### Feature-Gated Tests + +Tests automatically adapt to available features: + +```bash +#[cfg(feature = "surrealdb")] +#[tokio::test] +async fn test_surrealdb_specific_feature() { + // This test only runs when SurrealDB feature is enabled +} +``` + +## Integration Tests + +### Storage Integration Tests + +Location: `tests/storage_integration.rs` + +These tests verify consistent behavior across all storage backends: + +```bash +// Example: Test runs against all available backends +test_all_backends!(test_basic_crud_operations, |storage, gen| async move { + let task = gen.workflow_task(); + storage.enqueue(task.clone(), 1).await?; + // ... test implementation + Ok(()) +}); +``` + +**Key Test Scenarios:** + +- Basic CRUD operations +- Queue management and priorities +- Task status updates +- Batch operations +- Search and filtering +- Concurrent operations +- Error handling +- Performance characteristics + +### Migration Tests + +Location: `tests/migration_tests.rs` + +Validates data migration between all backend combinations: + +```bash +# Run migration tests +cargo test --features surrealdb --test migration_tests + +# Test specific migration scenarios +cargo test --features surrealdb test_filesystem_to_embedded_migration +cargo test --features surrealdb test_large_dataset_migration_performance +``` + +**Migration Test Coverage:** + +- Data integrity verification +- Rollback functionality +- Progress tracking +- Error recovery +- Performance scaling +- Filtering and batch operations + +### Factory Tests + +Location: `tests/factory_tests.rs` + +Tests configuration validation and backend selection: + +```toml +# Run factory tests +cargo test --test factory_tests + +# Test configuration validation +cargo test test_storage_config_validation_failures +``` + +## Benchmarks + +### Storage Benchmarks + +Location: `benches/storage_benchmarks.rs` + +```rust +# Run all storage benchmarks +cargo bench-storage + +# Run specific backend benchmarks +cargo bench-fs +cargo bench-surrealdb # Requires --features surrealdb + +# Run specific benchmark categories +cargo bench -- single_enqueue +cargo bench -- batch_operations +cargo bench -- concurrent_operations +``` + +**Benchmark Categories:** + +- Single operations (enqueue/dequeue) +- Batch operations +- Search and retrieval +- Concurrent operations +- Cleanup operations + +### Migration Benchmarks + +Location: `benches/migration_benchmarks.rs` + +```rust +# Run migration benchmarks +cargo bench-migration + +# Test migration performance +cargo bench -- basic_migration +cargo bench -- migration_batch_sizes +``` + +**Migration Benchmarks:** + +- Basic migration throughput +- Batch size optimization +- Verification overhead +- Progress tracking overhead +- Dry run performance + +## Test Helpers and Utilities + +### TestDataGenerator + +Provides consistent test data across all tests: + +```bash +use crate::helpers::TestDataGenerator; + +let gen = TestDataGenerator::new(); +let task = gen.workflow_task(); +let batch = gen.workflow_tasks_batch(10); +``` + +### StorageTestRunner + +Runs tests against all available storage backends: + +```bash +use crate::helpers::StorageTestRunner; + +let mut runner = StorageTestRunner::new(); +runner.run_against_all_backends(test_function).await; +``` + +### MockStorage + +Mock implementation for testing migration scenarios: + +```bash +use crate::helpers::MockStorage; + +let mock = MockStorage::new(); +mock.set_health(false); // Simulate failure +``` + +## Performance Testing + +### Benchmark Configuration + +Benchmarks are configured with: + +- Small sample sizes for expensive operations +- Throughput measurement for batch operations +- Memory usage tracking +- Concurrent operation testing + +### Performance Targets + +**Storage Operations:** + +- Single enqueue: < 1ms average +- Batch enqueue (100 tasks): < 100ms average +- Task retrieval: < 0.5ms average +- Search operations: < 50ms average + +**Migration Operations:** + +- Small dataset (100 tasks): < 5 seconds +- Large dataset (1000 tasks): < 30 seconds +- Throughput: > 10 tasks/second + +## Continuous Integration + +### CI Test Matrix + +```bash +# Example CI configuration +strategy: + matrix: + features: + - "" # Filesystem only + - "surrealdb" # All backends + rust: + - stable + - beta +``` + +### Test Commands for CI + +```bash +# Basic functionality tests +cargo test --no-default-features +cargo test --all-features + +# Documentation tests +cargo test --doc --all-features + +# Benchmark regression tests +cargo bench --all-features -- --test +``` + +## Debugging and Troubleshooting + +### Verbose Test Output + +```bash +# Enable detailed logging +RUST_LOG=debug cargo test --features surrealdb + +# Show test output +cargo test -- --nocapture + +# Run single test with full output +cargo test test_name -- --exact --nocapture +``` + +### Common Issues + +1. **SurrealDB tests failing**: Ensure `--features surrealdb` is specified +2. **Temporary directory errors**: Tests clean up automatically, but manual cleanup may be needed +3. **Port conflicts**: Tests use ephemeral ports, but conflicts can occur +4. **Timing issues**: Some tests use sleeps for async operations + +### Test Data Isolation + +- Each test uses unique temporary directories +- Mock storage is reset between tests +- Concurrent tests use separate data spaces +- Cleanup is automatic via `Drop` implementations + +## Coverage Analysis + +```bash +# Generate coverage report +cargo install cargo-tarpaulin +cargo test-coverage + +# View coverage report +open target/tarpaulin-report.html +``` + +## Performance Profiling + +```bash +# Profile storage operations +cargo bench --bench storage_benchmarks -- --profile-time=10 + +# Profile migration operations +cargo bench --bench migration_benchmarks -- --profile-time=10 + +# Generate flame graphs +cargo install flamegraph +cargo flamegraph --bench storage_benchmarks +``` + +## Best Practices + +### Writing Tests + +1. **Use descriptive test names** that explain what is being tested +2. **Test error conditions** as well as success paths +3. **Use feature gates** for backend-specific tests +4. **Clean up resources** using RAII patterns +5. **Test concurrency** where applicable + +### Test Data + +1. **Use test generators** for consistent data +2. **Test with realistic data sizes** +3. **Include edge cases** (empty data, large data, malformed data) +4. **Use deterministic data** where possible + +### Performance Testing + +1. **Set appropriate baselines** for performance regression +2. **Test with various data sizes** to understand scaling +3. **Include warmup iterations** for accurate measurements +4. **Document performance expectations** in code comments + +## Contributing + +When adding new features: + +1. Add unit tests for new components +2. Update integration tests for new storage methods +3. Add migration tests for new backends +4. Update benchmarks for performance-critical code +5. Document any new test utilities + +For more information on the storage architecture and API, see the main project documentation. \ No newline at end of file diff --git a/crates/vault-service/README.md b/crates/vault-service/README.md index 46d7d6a..9c4b367 100644 --- a/crates/vault-service/README.md +++ b/crates/vault-service/README.md @@ -1 +1,467 @@ -# KMS Service - Key Management Service\n\nA unified Key Management Service for the Provisioning platform with support for multiple backends: **Age** (development),\n**Cosmian KMS** (privacy-preserving), **RustyVault** (self-hosted), **AWS KMS** (cloud-native), and **HashiCorp Vault** (enterprise).\n\n## Features\n\n### Age Backend (Development)\n\n- ✅ Fast, offline encryption/decryption\n- ✅ No server required - local key files\n- ✅ Simple setup with age-keygen\n- ✅ Perfect for development and testing\n- ✅ Zero network dependency\n\n### RustyVault Backend (Self-hosted) ✨ NEW\n\n- ✅ Vault-compatible API (drop-in replacement)\n- ✅ Pure Rust implementation\n- ✅ Self-hosted secrets management\n- ✅ Apache 2.0 license (OSI-approved)\n- ✅ Transit secrets engine support\n- ✅ Embeddable or standalone\n- ✅ No vendor lock-in\n\n### Cosmian KMS Backend (Production)\n\n- ✅ Enterprise-grade key management\n- ✅ Confidential computing support (SGX/SEV)\n- ✅ Zero-knowledge architecture\n- ✅ Server-side key rotation\n- ✅ Audit logging and compliance\n- ✅ Multi-tenant support\n\n### Security Features\n\n- ✅ TLS for all communications (Cosmian)\n- ✅ Context-based encryption (AAD)\n- ✅ Automatic key rotation (Cosmian)\n- ✅ Data key generation (Cosmian)\n- ✅ Health monitoring\n- ✅ Operation metrics\n\n## Architecture\n\n```{$detected_lang}\n┌─────────────────────────────────────────────────\n────────┐\n│ KMS Service │\n├─────────────────────────────────────────────────\n────────┤\n│ REST API (Axum) │\n│ ├─ /api/v1/kms/encrypt POST │\n│ ├─ /api/v1/kms/decrypt POST │\n│ ├─ /api/v1/kms/generate-key POST (Cosmian only) │\n│ ├─ /api/v1/kms/status GET │\n│ └─ /api/v1/kms/health GET │\n├─────────────────────────────────────────────────\n────────┤\n│ Unified KMS Service Interface │\n│ ├─ encrypt(plaintext, context) -> ciphertext │\n│ ├─ decrypt(ciphertext, context) -> plaintext │\n│ ├─ generate_data_key(spec) -> DataKey │\n│ └─ health_check() -> bool │\n├─────────────────────────────────────────────────\n────────┤\n│ Backend Implementations │\n│ ├─ Age Client │\n│ │ ├─ X25519 encryption │\n│ │ ├─ Local key files │\n│ │ └─ Offline operation │\n│ └─ Cosmian KMS Client │\n│ ├─ REST API integration │\n│ ├─ Zero-knowledge encryption │\n│ └─ Confidential computing │\n└─────────────────────────────────────────────────\n────────┘\n```\n\n## Installation\n\n### Prerequisites\n\n- Rust 1.70+ (for building)\n- Age 1.2+ (for development backend)\n- Cosmian KMS server (for production backend)\n- Nushell 0.107+ (for CLI integration)\n\n### Build from Source\n\n```{$detected_lang}\ncd provisioning/platform/kms-service\ncargo build --release\n\n# Binary will be at: target/release/kms-service\n```\n\n## Configuration\n\n### Configuration File\n\nCreate `provisioning/config/kms.toml`:\n\n```{$detected_lang}\n[kms]\ndev_backend = "age"\nprod_backend = "cosmian"\nenvironment = "${PROVISIONING_ENV:-dev}"\n\n[kms.age]\npublic_key_path = "~/.config/provisioning/age/public_key.txt"\nprivate_key_path = "~/.config/provisioning/age/private_key.txt"\n\n[kms.cosmian]\nserver_url = "${COSMIAN_KMS_URL:-https://kms.example.com}"\napi_key = "${COSMIAN_API_KEY}"\ndefault_key_id = "provisioning-master-key"\ntls_verify = true\n```\n\n### Environment Variables\n\n```{$detected_lang}\n# Development with Age\nexport PROVISIONING_ENV=dev\n# Age keys will be read from paths in config\n\n# Production with Cosmian\nexport PROVISIONING_ENV=prod\nexport COSMIAN_KMS_URL="https://kms.example.com"\nexport COSMIAN_API_KEY="your-api-key"\n```\n\n## Quick Start\n\n### Development Setup (Age)\n\n```{$detected_lang}\n# 1. Generate Age keys\nmkdir -p ~/.config/provisioning/age\nage-keygen -o ~/.config/provisioning/age/private_key.txt\nage-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt\n\n# 2. Set environment\nexport PROVISIONING_ENV=dev\n\n# 3. Start KMS service\ncargo run --bin kms-service\n```\n\n### Production Setup (Cosmian)\n\n```{$detected_lang}\n# 1. Set up Cosmian KMS server (or use hosted service)\n\n# 2. Create master key in Cosmian KMS\n# (Use Cosmian KMS CLI or web interface)\n\n# 3. Set environment variables\nexport PROVISIONING_ENV=prod\nexport COSMIAN_KMS_URL=https://your-kms.example.com\nexport COSMIAN_API_KEY=your-api-key-here\n\n# 4. Start KMS service\ncargo run --bin kms-service\n```\n\n## Usage\n\n### REST API Examples\n\n#### Encrypt Data\n\n```{$detected_lang}\ncurl -X POST http://localhost:8082/api/v1/kms/encrypt \\n -H "Content-Type: application/json" \\n -d '{\n "plaintext": "SGVsbG8sIFdvcmxkIQ==",\n "context": "env=prod,service=api"\n }'\n```\n\n#### Decrypt Data\n\n```{$detected_lang}\ncurl -X POST http://localhost:8082/api/v1/kms/decrypt \\n -H "Content-Type: application/json" \\n -d '{\n "ciphertext": "...",\n "context": "env=prod,service=api"\n }'\n```\n\n#### Generate Data Key (Cosmian only)\n\n```{$detected_lang}\ncurl -X POST http://localhost:8082/api/v1/kms/generate-key \\n -H "Content-Type: application/json" \\n -d '{\n "key_spec": "AES_256"\n }'\n```\n\n#### Health Check\n\n```{$detected_lang}\ncurl http://localhost:8082/api/v1/kms/health\n```\n\n### Nushell CLI Integration\n\n```{$detected_lang}\n# Load the KMS module\nuse provisioning/core/nulib/kms\n\n# Set service URL\nexport KMS_SERVICE_URL="http://localhost:8082"\n\n# Encrypt data\n"secret-data" | kms encrypt\n"api-key" | kms encrypt --context "env=prod,service=api"\n\n# Decrypt data\n$ciphertext | kms decrypt\n$ciphertext | kms decrypt --context "env=prod,service=api"\n\n# Generate data key (Cosmian only)\nkms generate-key\nkms generate-key --key-spec AES_128\n\n# Check service status\nkms status\nkms health\n\n# Encrypt/decrypt files\nkms encrypt-file config.yaml\nkms encrypt-file secrets.json --output secrets.enc --context "env=prod"\n\nkms decrypt-file config.yaml.enc\nkms decrypt-file secrets.enc --output secrets.json --context "env=prod"\n```\n\n## Backend Comparison\n\n| Feature | Age | RustyVault | Cosmian KMS | AWS KMS | Vault |\n| --------- | ----- | ------------ | ------------- | --------- | ------- |\n| **Setup** | Simple | Self-hosted | Server setup | AWS account | Enterprise |\n| **Speed** | Very fast | Fast | Fast | Fast | Fast |\n| **Network** | No | Yes | Yes | Yes | Yes |\n| **Key Rotation** | Manual | Automatic | Automatic | Automatic | Automatic |\n| **Data Keys** | No | Yes | Yes | Yes | Yes |\n| **Audit Logging** | No | Yes | Full | Full | Full |\n| **Confidential** | No | No | Yes (SGX/SEV) | No | No |\n| **Multi-tenant** | No | Yes | Yes | Yes | Yes |\n| **License** | MIT | Apache 2.0 | Proprietary | Proprietary | BSL/Enterprise |\n| **Cost** | Free | Free | Paid | Paid | Paid |\n| **Use Case** | Dev/Test | Self-hosted | Privacy | AWS Cloud | Enterprise |\n\n## Integration Points\n\n### 1. Config Encryption (SOPS Integration)\n\n```{$detected_lang}\n# Encrypt configuration files\nkms encrypt-file workspace/config/secrets.yaml\n\n# SOPS can use KMS for key encryption\n# Configure in .sops.yaml to use KMS endpoint\n```\n\n### 2. Dynamic Secrets (Provider API Keys)\n\n```{$detected_lang}\n// Rust orchestrator can call KMS API\nlet encrypted_key = kms_client.encrypt(api_key.as_bytes(), &context).await?;\n```\n\n### 3. SSH Key Management\n\n```{$detected_lang}\n# Generate and encrypt temporal SSH keys\nssh-keygen -t ed25519 -f temp_key -N ""\nkms encrypt-file temp_key --context "infra=prod,purpose=deployment"\n```\n\n### 4. Orchestrator (Workflow Data)\n\n```{$detected_lang}\n// Encrypt sensitive workflow parameters\nlet encrypted_params = kms_service\n .encrypt(params_json.as_bytes(), &workflow_context)\n .await?;\n```\n\n### 5. Control Center (Audit Logs)\n\n- All KMS operations are logged\n- Audit trail for compliance\n- Integration with control center UI\n\n## Testing\n\n### Unit Tests\n\n```{$detected_lang}\ncargo test\n```\n\n### Integration Tests\n\n```{$detected_lang}\n# Age backend tests (no external dependencies)\ncargo test age\n\n# Cosmian backend tests (requires Cosmian KMS server)\nexport COSMIAN_KMS_URL=http://localhost:9999\nexport COSMIAN_API_KEY=test-key\ncargo test cosmian -- --ignored\n```\n\n## Deployment\n\n### Docker\n\n```{$detected_lang}\nFROM rust:1.70 as builder\nWORKDIR /app\nCOPY . .\nRUN cargo build --release\n\nFROM debian:bookworm-slim\nRUN apt-get update && \\n apt-get install -y ca-certificates && \\n rm -rf /var/lib/apt/lists/*\nCOPY --from=builder /app/target/release/kms-service /usr/local/bin/\nENTRYPOINT ["kms-service"]\n```\n\n### Kubernetes (Production with Cosmian)\n\n```{$detected_lang}\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: kms-service\nspec:\n replicas: 2\n template:\n spec:\n containers:\n - name: kms-service\n image: provisioning/kms-service:latest\n env:\n - name: PROVISIONING_ENV\n value: "prod"\n - name: COSMIAN_KMS_URL\n value: "https://kms.example.com"\n - name: COSMIAN_API_KEY\n valueFrom:\n secretKeyRef:\n name: cosmian-api-key\n key: api-key\n ports:\n - containerPort: 8082\n```\n\n### systemd Service\n\n```{$detected_lang}\n[Unit]\nDescription=KMS Service\nAfter=network.target\n\n[Service]\nType=simple\nUser=kms-service\nEnvironment="PROVISIONING_ENV=prod"\nEnvironment="COSMIAN_KMS_URL=https://kms.example.com"\nEnvironment="COSMIAN_API_KEY=your-api-key"\nExecStart=/usr/local/bin/kms-service\nRestart=always\n\n[Install]\nWantedBy=multi-user.target\n```\n\n## Security Best Practices\n\n1. **Development**: Use Age for dev/test only, never for production secrets\n2. **Production**: Always use Cosmian KMS with TLS verification enabled\n3. **API Keys**: Never hardcode Cosmian API keys, use environment variables\n4. **Key Rotation**: Enable automatic rotation in Cosmian (90 days recommended)\n5. **Context Encryption**: Always use encryption context (AAD) for additional security\n6. **Network Access**: Restrict KMS service access with firewall rules\n7. **Monitoring**: Enable health checks and monitor operation metrics\n\n## Migration from Vault/AWS KMS\n\nSee [KMS_SIMPLIFICATION.md](../../docs/migration/KMS_SIMPLIFICATION.md) for migration guide.\n\n## Monitoring\n\n### Metrics Endpoints\n\n```{$detected_lang}\n# Service status (includes operation count)\ncurl http://localhost:8082/api/v1/kms/status\n\n# Health check\ncurl http://localhost:8082/api/v1/kms/health\n```\n\n### Logs\n\n```{$detected_lang}\n# Set log level\nexport RUST_LOG="kms_service=debug,tower_http=debug"\n\n# View logs\njournalctl -u kms-service -f\n```\n\n## Troubleshooting\n\n### Age Backend Issues\n\n```{$detected_lang}\n# Check keys exist\nls -la ~/.config/provisioning/age/\n\n# Verify key format\ncat ~/.config/provisioning/age/public_key.txt\n# Should start with: age1...\n\n# Test encryption manually\necho "test" | age -r $(cat ~/.config/provisioning/age/public_key.txt) > test.enc\nage -d -i ~/.config/provisioning/age/private_key.txt test.enc\n```\n\n### Cosmian KMS Issues\n\n```{$detected_lang}\n# Check connectivity\ncurl https://kms.example.com/api/v1/health \\n -H "X-API-Key: $COSMIAN_API_KEY"\n\n# Verify API key\ncurl https://kms.example.com/api/v1/version \\n -H "X-API-Key: $COSMIAN_API_KEY"\n\n# Test encryption\ncurl -X POST https://kms.example.com/api/v1/encrypt \\n -H "X-API-Key: $COSMIAN_API_KEY" \\n -H "Content-Type: application/json" \\n -d '{"keyId":"master-key","data":"SGVsbG8="}'\n```\n\n## License\n\nCopyright © 2024 Provisioning Team\n\n## References\n\n- [Age Encryption](https://github.com/FiloSottile/age)\n- [Cosmian KMS](https://cosmian.com/kms/)\n- [Axum Web Framework](https://docs.rs/axum/)\n- [Confidential Computing](https://confidentialcomputing.io/) +# KMS Service - Key Management Service + +A unified Key Management Service for the Provisioning platform with support for multiple backends: **Age** (development), +**Cosmian KMS** (privacy-preserving), **RustyVault** (self-hosted), **AWS KMS** (cloud-native), and **HashiCorp Vault** (enterprise). + +## Features + +### Age Backend (Development) + +- ✅ Fast, offline encryption/decryption +- ✅ No server required - local key files +- ✅ Simple setup with age-keygen +- ✅ Perfect for development and testing +- ✅ Zero network dependency + +### RustyVault Backend (Self-hosted) ✨ NEW + +- ✅ Vault-compatible API (drop-in replacement) +- ✅ Pure Rust implementation +- ✅ Self-hosted secrets management +- ✅ Apache 2.0 license (OSI-approved) +- ✅ Transit secrets engine support +- ✅ Embeddable or standalone +- ✅ No vendor lock-in + +### Cosmian KMS Backend (Production) + +- ✅ Enterprise-grade key management +- ✅ Confidential computing support (SGX/SEV) +- ✅ Zero-knowledge architecture +- ✅ Server-side key rotation +- ✅ Audit logging and compliance +- ✅ Multi-tenant support + +### Security Features + +- ✅ TLS for all communications (Cosmian) +- ✅ Context-based encryption (AAD) +- ✅ Automatic key rotation (Cosmian) +- ✅ Data key generation (Cosmian) +- ✅ Health monitoring +- ✅ Operation metrics + +## Architecture + +```bash +┌───────────────────────────────────────────────── +────────┐ +│ KMS Service │ +├───────────────────────────────────────────────── +────────┤ +│ REST API (Axum) │ +│ ├─ /api/v1/kms/encrypt POST │ +│ ├─ /api/v1/kms/decrypt POST │ +│ ├─ /api/v1/kms/generate-key POST (Cosmian only) │ +│ ├─ /api/v1/kms/status GET │ +│ └─ /api/v1/kms/health GET │ +├───────────────────────────────────────────────── +────────┤ +│ Unified KMS Service Interface │ +│ ├─ encrypt(plaintext, context) -> ciphertext │ +│ ├─ decrypt(ciphertext, context) -> plaintext │ +│ ├─ generate_data_key(spec) -> DataKey │ +│ └─ health_check() -> bool │ +├───────────────────────────────────────────────── +────────┤ +│ Backend Implementations │ +│ ├─ Age Client │ +│ │ ├─ X25519 encryption │ +│ │ ├─ Local key files │ +│ │ └─ Offline operation │ +│ └─ Cosmian KMS Client │ +│ ├─ REST API integration │ +│ ├─ Zero-knowledge encryption │ +│ └─ Confidential computing │ +└───────────────────────────────────────────────── +────────┘ +``` + +## Installation + +### Prerequisites + +- Rust 1.70+ (for building) +- Age 1.2+ (for development backend) +- Cosmian KMS server (for production backend) +- Nushell 0.107+ (for CLI integration) + +### Build from Source + +```bash +cd provisioning/platform/kms-service +cargo build --release + +# Binary will be at: target/release/kms-service +``` + +## Configuration + +### Configuration File + +Create `provisioning/config/kms.toml`: + +```toml +[kms] +dev_backend = "age" +prod_backend = "cosmian" +environment = "${PROVISIONING_ENV:-dev}" + +[kms.age] +public_key_path = "~/.config/provisioning/age/public_key.txt" +private_key_path = "~/.config/provisioning/age/private_key.txt" + +[kms.cosmian] +server_url = "${COSMIAN_KMS_URL:-https://kms.example.com}" +api_key = "${COSMIAN_API_KEY}" +default_key_id = "provisioning-master-key" +tls_verify = true +``` + +### Environment Variables + +```bash +# Development with Age +export PROVISIONING_ENV=dev +# Age keys will be read from paths in config + +# Production with Cosmian +export PROVISIONING_ENV=prod +export COSMIAN_KMS_URL="https://kms.example.com" +export COSMIAN_API_KEY="your-api-key" +``` + +## Quick Start + +### Development Setup (Age) + +```bash +# 1. Generate Age keys +mkdir -p ~/.config/provisioning/age +age-keygen -o ~/.config/provisioning/age/private_key.txt +age-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt + +# 2. Set environment +export PROVISIONING_ENV=dev + +# 3. Start KMS service +cargo run --bin kms-service +``` + +### Production Setup (Cosmian) + +```bash +# 1. Set up Cosmian KMS server (or use hosted service) + +# 2. Create master key in Cosmian KMS +# (Use Cosmian KMS CLI or web interface) + +# 3. Set environment variables +export PROVISIONING_ENV=prod +export COSMIAN_KMS_URL=https://your-kms.example.com +export COSMIAN_API_KEY=your-api-key-here + +# 4. Start KMS service +cargo run --bin kms-service +``` + +## Usage + +### REST API Examples + +#### Encrypt Data + +```bash +curl -X POST http://localhost:8082/api/v1/kms/encrypt + -H "Content-Type: application/json" + -d '{ + "plaintext": "SGVsbG8sIFdvcmxkIQ==", + "context": "env=prod,service=api" + }' +``` + +#### Decrypt Data + +```bash +curl -X POST http://localhost:8082/api/v1/kms/decrypt + -H "Content-Type: application/json" + -d '{ + "ciphertext": "...", + "context": "env=prod,service=api" + }' +``` + +#### Generate Data Key (Cosmian only) + +```bash +curl -X POST http://localhost:8082/api/v1/kms/generate-key + -H "Content-Type: application/json" + -d '{ + "key_spec": "AES_256" + }' +``` + +#### Health Check + +```bash +curl http://localhost:8082/api/v1/kms/health +``` + +### Nushell CLI Integration + +```nushell +# Load the KMS module +use provisioning/core/nulib/kms + +# Set service URL +export KMS_SERVICE_URL="http://localhost:8082" + +# Encrypt data +"secret-data" | kms encrypt +"api-key" | kms encrypt --context "env=prod,service=api" + +# Decrypt data +$ciphertext | kms decrypt +$ciphertext | kms decrypt --context "env=prod,service=api" + +# Generate data key (Cosmian only) +kms generate-key +kms generate-key --key-spec AES_128 + +# Check service status +kms status +kms health + +# Encrypt/decrypt files +kms encrypt-file config.yaml +kms encrypt-file secrets.json --output secrets.enc --context "env=prod" + +kms decrypt-file config.yaml.enc +kms decrypt-file secrets.enc --output secrets.json --context "env=prod" +``` + +## Backend Comparison + +| Feature | Age | RustyVault | Cosmian KMS | AWS KMS | Vault | +| --------- | ----- | ------------ | ------------- | --------- | ------- | +| **Setup** | Simple | Self-hosted | Server setup | AWS account | Enterprise | +| **Speed** | Very fast | Fast | Fast | Fast | Fast | +| **Network** | No | Yes | Yes | Yes | Yes | +| **Key Rotation** | Manual | Automatic | Automatic | Automatic | Automatic | +| **Data Keys** | No | Yes | Yes | Yes | Yes | +| **Audit Logging** | No | Yes | Full | Full | Full | +| **Confidential** | No | No | Yes (SGX/SEV) | No | No | +| **Multi-tenant** | No | Yes | Yes | Yes | Yes | +| **License** | MIT | Apache 2.0 | Proprietary | Proprietary | BSL/Enterprise | +| **Cost** | Free | Free | Paid | Paid | Paid | +| **Use Case** | Dev/Test | Self-hosted | Privacy | AWS Cloud | Enterprise | + +## Integration Points + +### 1. Config Encryption (SOPS Integration) + +```toml +# Encrypt configuration files +kms encrypt-file workspace/config/secrets.yaml + +# SOPS can use KMS for key encryption +# Configure in .sops.yaml to use KMS endpoint +``` + +### 2. Dynamic Secrets (Provider API Keys) + +```bash +// Rust orchestrator can call KMS API +let encrypted_key = kms_client.encrypt(api_key.as_bytes(), &context).await?; +``` + +### 3. SSH Key Management + +```bash +# Generate and encrypt temporal SSH keys +ssh-keygen -t ed25519 -f temp_key -N "" +kms encrypt-file temp_key --context "infra=prod,purpose=deployment" +``` + +### 4. Orchestrator (Workflow Data) + +```bash +// Encrypt sensitive workflow parameters +let encrypted_params = kms_service + .encrypt(params_json.as_bytes(), &workflow_context) + .await?; +``` + +### 5. Control Center (Audit Logs) + +- All KMS operations are logged +- Audit trail for compliance +- Integration with control center UI + +## Testing + +### Unit Tests + +```bash +cargo test +``` + +### Integration Tests + +```bash +# Age backend tests (no external dependencies) +cargo test age + +# Cosmian backend tests (requires Cosmian KMS server) +export COSMIAN_KMS_URL=http://localhost:9999 +export COSMIAN_API_KEY=test-key +cargo test cosmian -- --ignored +``` + +## Deployment + +### Docker + +```bash +FROM rust:1.70 as builder +WORKDIR /app +COPY . . +RUN cargo build --release + +FROM debian:bookworm-slim +RUN apt-get update && + apt-get install -y ca-certificates && + rm -rf /var/lib/apt/lists/* +COPY --from=builder /app/target/release/kms-service /usr/local/bin/ +ENTRYPOINT ["kms-service"] +``` + +### Kubernetes (Production with Cosmian) + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: kms-service +spec: + replicas: 2 + template: + spec: + containers: + - name: kms-service + image: provisioning/kms-service:latest + env: + - name: PROVISIONING_ENV + value: "prod" + - name: COSMIAN_KMS_URL + value: "https://kms.example.com" + - name: COSMIAN_API_KEY + valueFrom: + secretKeyRef: + name: cosmian-api-key + key: api-key + ports: + - containerPort: 8082 +``` + +### systemd Service + +```toml +[Unit] +Description=KMS Service +After=network.target + +[Service] +Type=simple +User=kms-service +Environment="PROVISIONING_ENV=prod" +Environment="COSMIAN_KMS_URL=https://kms.example.com" +Environment="COSMIAN_API_KEY=your-api-key" +ExecStart=/usr/local/bin/kms-service +Restart=always + +[Install] +WantedBy=multi-user.target +``` + +## Security Best Practices + +1. **Development**: Use Age for dev/test only, never for production secrets +2. **Production**: Always use Cosmian KMS with TLS verification enabled +3. **API Keys**: Never hardcode Cosmian API keys, use environment variables +4. **Key Rotation**: Enable automatic rotation in Cosmian (90 days recommended) +5. **Context Encryption**: Always use encryption context (AAD) for additional security +6. **Network Access**: Restrict KMS service access with firewall rules +7. **Monitoring**: Enable health checks and monitor operation metrics + +## Migration from Vault/AWS KMS + +See [KMS_SIMPLIFICATION.md](../../docs/migration/KMS_SIMPLIFICATION.md) for migration guide. + +## Monitoring + +### Metrics Endpoints + +```bash +# Service status (includes operation count) +curl http://localhost:8082/api/v1/kms/status + +# Health check +curl http://localhost:8082/api/v1/kms/health +``` + +### Logs + +```bash +# Set log level +export RUST_LOG="kms_service=debug,tower_http=debug" + +# View logs +journalctl -u kms-service -f +``` + +## Troubleshooting + +### Age Backend Issues + +```bash +# Check keys exist +ls -la ~/.config/provisioning/age/ + +# Verify key format +cat ~/.config/provisioning/age/public_key.txt +# Should start with: age1... + +# Test encryption manually +echo "test" | age -r $(cat ~/.config/provisioning/age/public_key.txt) > test.enc +age -d -i ~/.config/provisioning/age/private_key.txt test.enc +``` + +### Cosmian KMS Issues + +```bash +# Check connectivity +curl https://kms.example.com/api/v1/health + -H "X-API-Key: $COSMIAN_API_KEY" + +# Verify API key +curl https://kms.example.com/api/v1/version + -H "X-API-Key: $COSMIAN_API_KEY" + +# Test encryption +curl -X POST https://kms.example.com/api/v1/encrypt + -H "X-API-Key: $COSMIAN_API_KEY" + -H "Content-Type: application/json" + -d '{"keyId":"master-key","data":"SGVsbG8="}' +``` + +## License + +Copyright © 2024 Provisioning Team + +## References + +- [Age Encryption](https://github.com/FiloSottile/age) +- [Cosmian KMS](https://cosmian.com/kms/) +- [Axum Web Framework](https://docs.rs/axum/) +- [Confidential Computing](https://confidentialcomputing.io/) \ No newline at end of file diff --git a/docs/deployment/deployment-guide.md b/docs/deployment/deployment-guide.md index d93a513..f4bc791 100644 --- a/docs/deployment/deployment-guide.md +++ b/docs/deployment/deployment-guide.md @@ -1 +1,757 @@ -# Provisioning Platform Deployment Guide\n\n**Version**: 3.0.0\n**Date**: 2025-10-06\n**Deployment Modes**: Solo, Multi-User, CI/CD, Enterprise\n\n---\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Prerequisites](#prerequisites)\n3. [Deployment Modes](#deployment-modes)\n4. [Quick Start](#quick-start)\n5. [Configuration](#configuration)\n6. [Deployment Methods](#deployment-methods)\n7. [Post-Deployment](#post-deployment)\n8. [Troubleshooting](#troubleshooting)\n\n---\n\n## Overview\n\nThe Provisioning Platform is a comprehensive infrastructure automation system that can be deployed in four modes:\n\n- **Solo**: Single-user local development (minimal services)\n- **Multi-User**: Team collaboration with source control\n- **CI/CD**: Automated deployment pipelines\n- **Enterprise**: Full production with monitoring, KMS, and audit logging\n\n### Architecture Components\n\n| Component | Solo | Multi-User | CI/CD | Enterprise |\n| ----------- | ------ | ------------ | ------- | ------------ |\n| Orchestrator | ✓ | ✓ | ✓ | ✓ |\n| Control Center | ✓ | ✓ | ✓ | ✓ |\n| CoreDNS | ✓ | ✓ | ✓ | ✓ |\n| OCI Registry (Zot) | ✓ | ✓ | ✓ | ---- |\n| Extension Registry | ✓ | ✓ | ✓ | ✓ |\n| Gitea | ---- | ✓ | ✓ | ✓ |\n| PostgreSQL | ---- | ✓ | ✓ | ✓ |\n| API Server | ---- | - | ✓ | ✓ |\n| Harbor | ---- | - | ---- | ✓ |\n| Cosmian KMS | ---- | - | ---- | ✓ |\n| Prometheus | ---- | - | ---- | ✓ |\n| Grafana | ---- | - | ---- | ✓ |\n| Loki + Promtail | ---- | - | ---- | ✓ |\n| Elasticsearch + Kibana | ---- | - | ---- | ✓ |\n| Nginx Reverse Proxy | ---- | - | ---- | ✓ |\n\n---\n\n## Prerequisites\n\n### Required Software\n\n1. **Docker** (version 20.10+)\n\n ```bash\n docker --version\n # Docker version 20.10.0 or higher\n ```\n\n2. **Docker Compose** (version 2.0+)\n\n ```bash\n docker-compose --version\n # Docker Compose version 2.0.0 or higher\n ```\n\n3. **Nushell** (version 0.107.1+ for automation scripts)\n\n ```bash\n nu --version\n # 0.107.1 or higher\n ```\n\n### System Requirements\n\n#### Solo Mode\n\n- **CPU**: 2 cores\n- **Memory**: 4GB RAM\n- **Disk**: 20GB free space\n- **Network**: Internet connection for pulling images\n\n#### Multi-User Mode\n\n- **CPU**: 4 cores\n- **Memory**: 8GB RAM\n- **Disk**: 50GB free space\n- **Network**: Internet connection + internal network\n\n#### CI/CD Mode\n\n- **CPU**: 8 cores\n- **Memory**: 16GB RAM\n- **Disk**: 100GB free space\n- **Network**: Internet + dedicated CI/CD network\n\n#### Enterprise Mode\n\n- **CPU**: 16 cores\n- **Memory**: 32GB RAM\n- **Disk**: 500GB free space (SSD recommended)\n- **Network**: High-bandwidth, low-latency network\n\n### Optional Tools\n\n- **OpenSSL** (for generating secrets)\n- **kubectl** (for Kubernetes deployment)\n- **Helm** (for Kubernetes package management)\n\n---\n\n## Deployment Modes\n\n### Solo Mode\n\n**Use Case**: Local development, testing, personal use\n\n**Features**:\n\n- Minimal resource usage\n- No authentication required\n- SQLite databases\n- Local file storage\n\n**Limitations**:\n\n- Single user only\n- No version control integration\n- No audit logging\n\n### Multi-User Mode\n\n**Use Case**: Small team collaboration\n\n**Features**:\n\n- Multi-user authentication\n- Gitea for source control\n- PostgreSQL shared database\n- User management\n\n**Limitations**:\n\n- No automated pipelines\n- No advanced monitoring\n\n### CI/CD Mode\n\n**Use Case**: Automated deployment pipelines\n\n**Features**:\n\n- All Multi-User features\n- Provisioning API Server\n- Webhook support\n- Jenkins/GitLab Runner integration\n\n**Limitations**:\n\n- Basic monitoring only\n\n### Enterprise Mode\n\n**Use Case**: Production deployments, compliance requirements\n\n**Features**:\n\n- All CI/CD features\n- Harbor registry (enterprise OCI)\n- Cosmian KMS (secret management)\n- Full monitoring stack (Prometheus, Grafana)\n- Log aggregation (Loki, Elasticsearch)\n- Audit logging\n- TLS/SSL encryption\n- Nginx reverse proxy\n\n---\n\n## Quick Start\n\n### 1. Clone Repository\n\n```{$detected_lang}\ncd /opt\ngit clone https://github.com/your-org/project-provisioning.git\ncd project-provisioning/provisioning/platform\n```\n\n### 2. Generate Secrets\n\n```{$detected_lang}\n# Generate .env file with random secrets\n./scripts/generate-secrets.nu\n\n# Or copy and edit manually\ncp .env.example .env\nnano .env\n```\n\n### 3. Choose Deployment Mode and Deploy\n\n#### Solo Mode\n\n```{$detected_lang}\n./scripts/deploy-platform.nu --mode solo\n```\n\n#### Multi-User Mode\n\n```{$detected_lang}\n# Generate secrets first\n./scripts/generate-secrets.nu\n\n# Deploy\n./scripts/deploy-platform.nu --mode multi-user\n```\n\n#### CI/CD Mode\n\n```{$detected_lang}\n./scripts/deploy-platform.nu --mode cicd --build\n```\n\n#### Enterprise Mode\n\n```{$detected_lang}\n# Full production deployment\n./scripts/deploy-platform.nu --mode enterprise --build --wait 600\n```\n\n### 4. Verify Deployment\n\n```{$detected_lang}\n# Check all services\n./scripts/health-check.nu\n\n# View logs\ndocker-compose logs -f\n```\n\n### 5. Access Services\n\n- **Orchestrator**: \n- **Control Center**: \n- **OCI Registry**: \n- **Gitea** (Multi-User+): \n- **Grafana** (Enterprise): \n\n---\n\n## Configuration\n\n### Environment Variables\n\nThe `.env` file controls all deployment settings. Key variables:\n\n#### Platform Configuration\n\n```{$detected_lang}\nPROVISIONING_MODE=solo # solo, multi-user, cicd, enterprise\nPLATFORM_ENVIRONMENT=development # development, staging, production\n```\n\n#### Service Ports\n\n```{$detected_lang}\nORCHESTRATOR_PORT=8080\nCONTROL_CENTER_PORT=8081\nGITEA_HTTP_PORT=3000\nOCI_REGISTRY_PORT=5000\n```\n\n#### Security Settings\n\n```{$detected_lang}\n# Generate with: openssl rand -base64 32\nCONTROL_CENTER_JWT_SECRET=\nAPI_SERVER_JWT_SECRET=\nPOSTGRES_PASSWORD=\n```\n\n#### Resource Limits\n\n```{$detected_lang}\nORCHESTRATOR_CPU_LIMIT=2000m\nORCHESTRATOR_MEMORY_LIMIT=2048M\n```\n\n### Configuration Files\n\n#### Docker Compose\n\n- **Main**: `docker-compose.yaml` (base services)\n- **Solo**: `infrastructure/docker/docker-compose.solo.yaml`\n- **Multi-User**: `infrastructure/docker/docker-compose.multi-user.yaml`\n- **CI/CD**: `infrastructure/docker/docker-compose.cicd.yaml`\n- **Enterprise**: `infrastructure/docker/docker-compose.enterprise.yaml`\n\n#### Service Configurations\n\n- **Orchestrator**: `orchestrator/config.defaults.toml`\n- **Control Center**: `control-center/config.defaults.toml`\n- **CoreDNS**: `config/coredns/Corefile`\n- **OCI Registry**: `infrastructure/oci-registry/config.json`\n- **Nginx**: `infrastructure/nginx/nginx.conf`\n- **Prometheus**: `infrastructure/monitoring/prometheus/prometheus.yml`\n\n---\n\n## Deployment Methods\n\n### Method 1: Docker Compose (Recommended)\n\n#### Deploy\n\n```{$detected_lang}\n# Solo mode\ndocker-compose -f docker-compose.yaml \\n -f infrastructure/docker/docker-compose.solo.yaml \\n up -d\n\n# Multi-user mode\ndocker-compose -f docker-compose.yaml \\n -f infrastructure/docker/docker-compose.multi-user.yaml \\n up -d\n\n# CI/CD mode\ndocker-compose -f docker-compose.yaml \\n -f infrastructure/docker/docker-compose.multi-user.yaml \\n -f infrastructure/docker/docker-compose.cicd.yaml \\n up -d\n\n# Enterprise mode\ndocker-compose -f docker-compose.yaml \\n -f infrastructure/docker/docker-compose.multi-user.yaml \\n -f infrastructure/docker/docker-compose.cicd.yaml \\n -f infrastructure/docker/docker-compose.enterprise.yaml \\n up -d\n```\n\n#### Manage Services\n\n```{$detected_lang}\n# View logs\ndocker-compose logs -f [service-name]\n\n# Restart service\ndocker-compose restart orchestrator\n\n# Stop all services\ndocker-compose down\n\n# Stop and remove volumes (WARNING: data loss)\ndocker-compose down --volumes\n```\n\n### Method 2: Systemd (Linux Production)\n\n#### Install Services\n\n```{$detected_lang}\ncd systemd\nsudo ./install-services.sh\n```\n\n#### Manage via systemd\n\n```{$detected_lang}\n# Start platform\nsudo systemctl start provisioning-platform\n\n# Enable auto-start on boot\nsudo systemctl enable provisioning-platform\n\n# Check status\nsudo systemctl status provisioning-platform\n\n# View logs\nsudo journalctl -u provisioning-platform -f\n\n# Restart\nsudo systemctl restart provisioning-platform\n\n# Stop\nsudo systemctl stop provisioning-platform\n```\n\n### Method 3: Kubernetes\n\nSee [KUBERNETES_DEPLOYMENT.md](./KUBERNETES_DEPLOYMENT.md) for detailed instructions.\n\n#### Quick Deploy\n\n```{$detected_lang}\n# Create namespace\nkubectl apply -f k8s/base/namespace.yaml\n\n# Deploy services\nkubectl apply -f k8s/deployments/\nkubectl apply -f k8s/services/\nkubectl apply -f k8s/ingress/\n\n# Check status\nkubectl get pods -n provisioning\n```\n\n### Method 4: Automation Script (Nushell)\n\n```{$detected_lang}\n# Deploy with options\n./scripts/deploy-platform.nu --mode enterprise \\n --build \\n --wait 300\n\n# Health check\n./scripts/health-check.nu\n\n# Dry run (show what would be deployed)\n./scripts/deploy-platform.nu --mode enterprise --dry-run\n```\n\n---\n\n## Post-Deployment\n\n### 1. Verify Services\n\n```{$detected_lang}\n# Quick health check\n./scripts/health-check.nu\n\n# Detailed Docker status\ndocker-compose ps\n\n# Check individual service\ncurl http://localhost:9090/health\n```\n\n### 2. Initial Configuration\n\n#### Create Admin User (Multi-User+)\n\nAccess Gitea at and complete setup wizard.\n\n#### Configure DNS (Optional)\n\nAdd to `/etc/hosts` or configure local DNS:\n\n```{$detected_lang}\n127.0.0.1 provisioning.local\n127.0.0.1 gitea.provisioning.local\n127.0.0.1 grafana.provisioning.local\n```\n\n#### Configure Monitoring (Enterprise)\n\n1. Access Grafana: \n2. Login with credentials from `.env`:\n - Username: `admin`\n - Password: `${GRAFANA_ADMIN_PASSWORD}`\n3. Dashboards are auto-provisioned from `infrastructure/monitoring/grafana/dashboards/`\n\n### 3. Load Extensions\n\n```{$detected_lang}\n# List available extensions\ncurl http://localhost:8082/api/v1/extensions\n\n# Upload extension (example)\ncurl -X POST http://localhost:8082/api/v1/extensions/upload \\n -F "file=@my-extension.tar.gz"\n```\n\n### 4. Test Workflows\n\n```{$detected_lang}\n# Create test server (via orchestrator API)\ncurl -X POST http://localhost:9090/workflows/servers/create \\n -H "Content-Type: application/json" \\n -d '{"name": "test-server", "plan": "1xCPU-2GB"}'\n\n# Check workflow status\ncurl http://localhost:9090/tasks/\n```\n\n---\n\n## Troubleshooting\n\n### Common Issues\n\n#### Services Not Starting\n\n**Symptom**: `docker-compose up` fails or services crash\n\n**Solutions**:\n\n1. Check Docker daemon:\n\n ```bash\n systemctl status docker\n ```\n\n1. Check logs:\n\n ```bash\n docker-compose logs orchestrator\n ```\n\n2. Check resource limits:\n\n ```bash\n docker stats\n ```\n\n3. Increase Docker resources in Docker Desktop settings\n\n#### Port Conflicts\n\n**Symptom**: `Error: port is already allocated`\n\n**Solutions**:\n\n1. Find conflicting process:\n\n ```bash\n lsof -i :9090\n ```\n\n2. Change port in `.env`:\n\n ```bash\n ORCHESTRATOR_PORT=9080\n ```\n\n3. Restart deployment:\n\n ```bash\n docker-compose down\n docker-compose up -d\n ```\n\n#### Health Checks Failing\n\n**Symptom**: Health check script reports unhealthy services\n\n**Solutions**:\n\n1. Check service logs:\n\n ```bash\n docker-compose logs -f \n ```\n\n2. Verify network connectivity:\n\n ```bash\n docker network inspect provisioning-net\n ```\n\n3. Check firewall rules:\n\n ```bash\n sudo ufw status\n ```\n\n4. Wait longer for services to start:\n\n ```bash\n ./scripts/deploy-platform.nu --wait 600\n ```\n\n#### Database Connection Errors\n\n**Symptom**: PostgreSQL connection refused\n\n**Solutions**:\n\n1. Check PostgreSQL health:\n\n ```bash\n docker exec provisioning-postgres pg_isready\n ```\n\n2. Verify credentials in `.env`:\n\n ```bash\n grep POSTGRES_ .env\n ```\n\n3. Check PostgreSQL logs:\n\n ```bash\n docker-compose logs postgres\n ```\n\n4. Recreate database:\n\n ```bash\n docker-compose down\n docker volume rm provisioning_postgres-data\n docker-compose up -d\n ```\n\n#### Out of Disk Space\n\n**Symptom**: No space left on device\n\n**Solutions**:\n\n1. Clean Docker volumes:\n\n ```bash\n docker volume prune\n ```\n\n2. Clean Docker images:\n\n ```bash\n docker image prune -a\n ```\n\n3. Check volume sizes:\n\n ```bash\n docker system df -v\n ```\n\n### Getting Help\n\n- **Logs**: Always check logs first: `docker-compose logs -f`\n- **Health**: Run health check: `./scripts/health-check.nu --json`\n- **Documentation**: See `docs/` directory\n- **Issues**: File bug reports at GitHub repository\n\n---\n\n## Security Best Practices\n\n### 1. Secret Management\n\n- **Never commit** `.env` files to version control\n- Use `./scripts/generate-secrets.nu` to generate strong secrets\n- Rotate secrets regularly\n- Use KMS in enterprise mode\n\n### 2. Network Security\n\n- Use TLS/SSL in production (enterprise mode)\n- Configure firewall rules:\n\n ```bash\n sudo ufw allow 80/tcp\n sudo ufw allow 443/tcp\n sudo ufw enable\n ```\n\n- Use private networks for backend services\n\n### 3. Access Control\n\n- Enable authentication in multi-user mode\n- Use strong passwords (16+ characters)\n- Configure API keys for CI/CD access\n- Enable audit logging in enterprise mode\n\n### 4. Regular Updates\n\n```{$detected_lang}\n# Pull latest images\ndocker-compose pull\n\n# Rebuild with updates\n./scripts/deploy-platform.nu --pull --build\n```\n\n---\n\n## Backup and Recovery\n\n### Backup\n\n```{$detected_lang}\n# Backup volumes\ndocker run --rm -v provisioning_orchestrator-data:/data \\n -v $(pwd)/backup:/backup \\n alpine tar czf /backup/orchestrator-data.tar.gz -C /data .\n\n# Backup PostgreSQL\ndocker exec provisioning-postgres pg_dumpall -U provisioning > backup/postgres-backup.sql\n```\n\n### Restore\n\n```{$detected_lang}\n# Restore volume\ndocker run --rm -v provisioning_orchestrator-data:/data \\n -v $(pwd)/backup:/backup \\n alpine tar xzf /backup/orchestrator-data.tar.gz -C /data\n\n# Restore PostgreSQL\ndocker exec -i provisioning-postgres psql -U provisioning < backup/postgres-backup.sql\n```\n\n---\n\n## Maintenance\n\n### Updates\n\n```{$detected_lang}\n# Pull latest images\ndocker-compose pull\n\n# Recreate containers\ndocker-compose up -d --force-recreate\n\n# Remove old images\ndocker image prune\n```\n\n### Monitoring\n\n- **Prometheus**: \n- **Grafana**: \n- **Logs**: `docker-compose logs -f`\n\n### Health Checks\n\n```{$detected_lang}\n# Automated health check\n./scripts/health-check.nu\n\n# Manual checks\ncurl http://localhost:9090/health\ncurl http://localhost:8081/health\n```\n\n---\n\n## Next Steps\n\n- [Production Deployment Guide](./PRODUCTION_DEPLOYMENT.md)\n- [Kubernetes Deployment Guide](./KUBERNETES_DEPLOYMENT.md)\n- [Docker Compose Reference](./DOCKER_COMPOSE_REFERENCE.md)\n- [Monitoring Setup](./MONITORING_SETUP.md)\n- [Security Hardening](./SECURITY_HARDENING.md)\n\n---\n\n**Documentation Version**: 1.0.0\n**Last Updated**: 2025-10-06\n**Maintained By**: Platform Team +# Provisioning Platform Deployment Guide + +**Version**: 3.0.0 +**Date**: 2025-10-06 +**Deployment Modes**: Solo, Multi-User, CI/CD, Enterprise + +--- + +## Table of Contents + +1. [Overview](#overview) +2. [Prerequisites](#prerequisites) +3. [Deployment Modes](#deployment-modes) +4. [Quick Start](#quick-start) +5. [Configuration](#configuration) +6. [Deployment Methods](#deployment-methods) +7. [Post-Deployment](#post-deployment) +8. [Troubleshooting](#troubleshooting) + +--- + +## Overview + +The Provisioning Platform is a comprehensive infrastructure automation system that can be deployed in four modes: + +- **Solo**: Single-user local development (minimal services) +- **Multi-User**: Team collaboration with source control +- **CI/CD**: Automated deployment pipelines +- **Enterprise**: Full production with monitoring, KMS, and audit logging + +### Architecture Components + +| Component | Solo | Multi-User | CI/CD | Enterprise | +| ----------- | ------ | ------------ | ------- | ------------ | +| Orchestrator | ✓ | ✓ | ✓ | ✓ | +| Control Center | ✓ | ✓ | ✓ | ✓ | +| CoreDNS | ✓ | ✓ | ✓ | ✓ | +| OCI Registry (Zot) | ✓ | ✓ | ✓ | ---- | +| Extension Registry | ✓ | ✓ | ✓ | ✓ | +| Gitea | ---- | ✓ | ✓ | ✓ | +| PostgreSQL | ---- | ✓ | ✓ | ✓ | +| API Server | ---- | - | ✓ | ✓ | +| Harbor | ---- | - | ---- | ✓ | +| Cosmian KMS | ---- | - | ---- | ✓ | +| Prometheus | ---- | - | ---- | ✓ | +| Grafana | ---- | - | ---- | ✓ | +| Loki + Promtail | ---- | - | ---- | ✓ | +| Elasticsearch + Kibana | ---- | - | ---- | ✓ | +| Nginx Reverse Proxy | ---- | - | ---- | ✓ | + +--- + +## Prerequisites + +### Required Software + +1. **Docker** (version 20.10+) + + ```bash + docker --version + # Docker version 20.10.0 or higher + ``` + +2. **Docker Compose** (version 2.0+) + + ```bash + docker-compose --version + # Docker Compose version 2.0.0 or higher + ``` + +3. **Nushell** (version 0.107.1+ for automation scripts) + + ```bash + nu --version + # 0.107.1 or higher + ``` + +### System Requirements + +#### Solo Mode + +- **CPU**: 2 cores +- **Memory**: 4GB RAM +- **Disk**: 20GB free space +- **Network**: Internet connection for pulling images + +#### Multi-User Mode + +- **CPU**: 4 cores +- **Memory**: 8GB RAM +- **Disk**: 50GB free space +- **Network**: Internet connection + internal network + +#### CI/CD Mode + +- **CPU**: 8 cores +- **Memory**: 16GB RAM +- **Disk**: 100GB free space +- **Network**: Internet + dedicated CI/CD network + +#### Enterprise Mode + +- **CPU**: 16 cores +- **Memory**: 32GB RAM +- **Disk**: 500GB free space (SSD recommended) +- **Network**: High-bandwidth, low-latency network + +### Optional Tools + +- **OpenSSL** (for generating secrets) +- **kubectl** (for Kubernetes deployment) +- **Helm** (for Kubernetes package management) + +--- + +## Deployment Modes + +### Solo Mode + +**Use Case**: Local development, testing, personal use + +**Features**: + +- Minimal resource usage +- No authentication required +- SQLite databases +- Local file storage + +**Limitations**: + +- Single user only +- No version control integration +- No audit logging + +### Multi-User Mode + +**Use Case**: Small team collaboration + +**Features**: + +- Multi-user authentication +- Gitea for source control +- PostgreSQL shared database +- User management + +**Limitations**: + +- No automated pipelines +- No advanced monitoring + +### CI/CD Mode + +**Use Case**: Automated deployment pipelines + +**Features**: + +- All Multi-User features +- Provisioning API Server +- Webhook support +- Jenkins/GitLab Runner integration + +**Limitations**: + +- Basic monitoring only + +### Enterprise Mode + +**Use Case**: Production deployments, compliance requirements + +**Features**: + +- All CI/CD features +- Harbor registry (enterprise OCI) +- Cosmian KMS (secret management) +- Full monitoring stack (Prometheus, Grafana) +- Log aggregation (Loki, Elasticsearch) +- Audit logging +- TLS/SSL encryption +- Nginx reverse proxy + +--- + +## Quick Start + +### 1. Clone Repository + +```bash +cd /opt +git clone https://github.com/your-org/project-provisioning.git +cd project-provisioning/provisioning/platform +``` + +### 2. Generate Secrets + +```bash +# Generate .env file with random secrets +./scripts/generate-secrets.nu + +# Or copy and edit manually +cp .env.example .env +nano .env +``` + +### 3. Choose Deployment Mode and Deploy + +#### Solo Mode + +```bash +./scripts/deploy-platform.nu --mode solo +``` + +#### Multi-User Mode + +```bash +# Generate secrets first +./scripts/generate-secrets.nu + +# Deploy +./scripts/deploy-platform.nu --mode multi-user +``` + +#### CI/CD Mode + +```bash +./scripts/deploy-platform.nu --mode cicd --build +``` + +#### Enterprise Mode + +```bash +# Full production deployment +./scripts/deploy-platform.nu --mode enterprise --build --wait 600 +``` + +### 4. Verify Deployment + +```bash +# Check all services +./scripts/health-check.nu + +# View logs +docker-compose logs -f +``` + +### 5. Access Services + +- **Orchestrator**: +- **Control Center**: +- **OCI Registry**: +- **Gitea** (Multi-User+): +- **Grafana** (Enterprise): + +--- + +## Configuration + +### Environment Variables + +The `.env` file controls all deployment settings. Key variables: + +#### Platform Configuration + +```toml +PROVISIONING_MODE=solo # solo, multi-user, cicd, enterprise +PLATFORM_ENVIRONMENT=development # development, staging, production +``` + +#### Service Ports + +```bash +ORCHESTRATOR_PORT=8080 +CONTROL_CENTER_PORT=8081 +GITEA_HTTP_PORT=3000 +OCI_REGISTRY_PORT=5000 +``` + +#### Security Settings + +```toml +# Generate with: openssl rand -base64 32 +CONTROL_CENTER_JWT_SECRET= +API_SERVER_JWT_SECRET= +POSTGRES_PASSWORD= +``` + +#### Resource Limits + +```bash +ORCHESTRATOR_CPU_LIMIT=2000m +ORCHESTRATOR_MEMORY_LIMIT=2048M +``` + +### Configuration Files + +#### Docker Compose + +- **Main**: `docker-compose.yaml` (base services) +- **Solo**: `infrastructure/docker/docker-compose.solo.yaml` +- **Multi-User**: `infrastructure/docker/docker-compose.multi-user.yaml` +- **CI/CD**: `infrastructure/docker/docker-compose.cicd.yaml` +- **Enterprise**: `infrastructure/docker/docker-compose.enterprise.yaml` + +#### Service Configurations + +- **Orchestrator**: `orchestrator/config.defaults.toml` +- **Control Center**: `control-center/config.defaults.toml` +- **CoreDNS**: `config/coredns/Corefile` +- **OCI Registry**: `infrastructure/oci-registry/config.json` +- **Nginx**: `infrastructure/nginx/nginx.conf` +- **Prometheus**: `infrastructure/monitoring/prometheus/prometheus.yml` + +--- + +## Deployment Methods + +### Method 1: Docker Compose (Recommended) + +#### Deploy + +```bash +# Solo mode +docker-compose -f docker-compose.yaml + -f infrastructure/docker/docker-compose.solo.yaml + up -d + +# Multi-user mode +docker-compose -f docker-compose.yaml + -f infrastructure/docker/docker-compose.multi-user.yaml + up -d + +# CI/CD mode +docker-compose -f docker-compose.yaml + -f infrastructure/docker/docker-compose.multi-user.yaml + -f infrastructure/docker/docker-compose.cicd.yaml + up -d + +# Enterprise mode +docker-compose -f docker-compose.yaml + -f infrastructure/docker/docker-compose.multi-user.yaml + -f infrastructure/docker/docker-compose.cicd.yaml + -f infrastructure/docker/docker-compose.enterprise.yaml + up -d +``` + +#### Manage Services + +```bash +# View logs +docker-compose logs -f [service-name] + +# Restart service +docker-compose restart orchestrator + +# Stop all services +docker-compose down + +# Stop and remove volumes (WARNING: data loss) +docker-compose down --volumes +``` + +### Method 2: Systemd (Linux Production) + +#### Install Services + +```bash +cd systemd +sudo ./install-services.sh +``` + +#### Manage via systemd + +```bash +# Start platform +sudo systemctl start provisioning-platform + +# Enable auto-start on boot +sudo systemctl enable provisioning-platform + +# Check status +sudo systemctl status provisioning-platform + +# View logs +sudo journalctl -u provisioning-platform -f + +# Restart +sudo systemctl restart provisioning-platform + +# Stop +sudo systemctl stop provisioning-platform +``` + +### Method 3: Kubernetes + +See [KUBERNETES_DEPLOYMENT.md](./KUBERNETES_DEPLOYMENT.md) for detailed instructions. + +#### Quick Deploy + +```bash +# Create namespace +kubectl apply -f k8s/base/namespace.yaml + +# Deploy services +kubectl apply -f k8s/deployments/ +kubectl apply -f k8s/services/ +kubectl apply -f k8s/ingress/ + +# Check status +kubectl get pods -n provisioning +``` + +### Method 4: Automation Script (Nushell) + +```nushell +# Deploy with options +./scripts/deploy-platform.nu --mode enterprise + --build + --wait 300 + +# Health check +./scripts/health-check.nu + +# Dry run (show what would be deployed) +./scripts/deploy-platform.nu --mode enterprise --dry-run +``` + +--- + +## Post-Deployment + +### 1. Verify Services + +```bash +# Quick health check +./scripts/health-check.nu + +# Detailed Docker status +docker-compose ps + +# Check individual service +curl http://localhost:9090/health +``` + +### 2. Initial Configuration + +#### Create Admin User (Multi-User+) + +Access Gitea at and complete setup wizard. + +#### Configure DNS (Optional) + +Add to `/etc/hosts` or configure local DNS: + +```toml +127.0.0.1 provisioning.local +127.0.0.1 gitea.provisioning.local +127.0.0.1 grafana.provisioning.local +``` + +#### Configure Monitoring (Enterprise) + +1. Access Grafana: +2. Login with credentials from `.env`: + - Username: `admin` + - Password: `${GRAFANA_ADMIN_PASSWORD}` +3. Dashboards are auto-provisioned from `infrastructure/monitoring/grafana/dashboards/` + +### 3. Load Extensions + +```bash +# List available extensions +curl http://localhost:8082/api/v1/extensions + +# Upload extension (example) +curl -X POST http://localhost:8082/api/v1/extensions/upload + -F "file=@my-extension.tar.gz" +``` + +### 4. Test Workflows + +```bash +# Create test server (via orchestrator API) +curl -X POST http://localhost:9090/workflows/servers/create + -H "Content-Type: application/json" + -d '{"name": "test-server", "plan": "1xCPU-2GB"}' + +# Check workflow status +curl http://localhost:9090/tasks/ +``` + +--- + +## Troubleshooting + +### Common Issues + +#### Services Not Starting + +**Symptom**: `docker-compose up` fails or services crash + +**Solutions**: + +1. Check Docker daemon: + + ```bash + systemctl status docker + ``` + +1. Check logs: + + ```bash + docker-compose logs orchestrator + ``` + +2. Check resource limits: + + ```bash + docker stats + ``` + +3. Increase Docker resources in Docker Desktop settings + +#### Port Conflicts + +**Symptom**: `Error: port is already allocated` + +**Solutions**: + +1. Find conflicting process: + + ```bash + lsof -i :9090 + ``` + +2. Change port in `.env`: + + ```bash + ORCHESTRATOR_PORT=9080 + ``` + +3. Restart deployment: + + ```bash + docker-compose down + docker-compose up -d + ``` + +#### Health Checks Failing + +**Symptom**: Health check script reports unhealthy services + +**Solutions**: + +1. Check service logs: + + ```bash + docker-compose logs -f + ``` + +2. Verify network connectivity: + + ```bash + docker network inspect provisioning-net + ``` + +3. Check firewall rules: + + ```bash + sudo ufw status + ``` + +4. Wait longer for services to start: + + ```bash + ./scripts/deploy-platform.nu --wait 600 + ``` + +#### Database Connection Errors + +**Symptom**: PostgreSQL connection refused + +**Solutions**: + +1. Check PostgreSQL health: + + ```bash + docker exec provisioning-postgres pg_isready + ``` + +2. Verify credentials in `.env`: + + ```bash + grep POSTGRES_ .env + ``` + +3. Check PostgreSQL logs: + + ```bash + docker-compose logs postgres + ``` + +4. Recreate database: + + ```bash + docker-compose down + docker volume rm provisioning_postgres-data + docker-compose up -d + ``` + +#### Out of Disk Space + +**Symptom**: No space left on device + +**Solutions**: + +1. Clean Docker volumes: + + ```bash + docker volume prune + ``` + +2. Clean Docker images: + + ```bash + docker image prune -a + ``` + +3. Check volume sizes: + + ```bash + docker system df -v + ``` + +### Getting Help + +- **Logs**: Always check logs first: `docker-compose logs -f` +- **Health**: Run health check: `./scripts/health-check.nu --json` +- **Documentation**: See `docs/` directory +- **Issues**: File bug reports at GitHub repository + +--- + +## Security Best Practices + +### 1. Secret Management + +- **Never commit** `.env` files to version control +- Use `./scripts/generate-secrets.nu` to generate strong secrets +- Rotate secrets regularly +- Use KMS in enterprise mode + +### 2. Network Security + +- Use TLS/SSL in production (enterprise mode) +- Configure firewall rules: + + ```bash + sudo ufw allow 80/tcp + sudo ufw allow 443/tcp + sudo ufw enable + ``` + +- Use private networks for backend services + +### 3. Access Control + +- Enable authentication in multi-user mode +- Use strong passwords (16+ characters) +- Configure API keys for CI/CD access +- Enable audit logging in enterprise mode + +### 4. Regular Updates + +```bash +# Pull latest images +docker-compose pull + +# Rebuild with updates +./scripts/deploy-platform.nu --pull --build +``` + +--- + +## Backup and Recovery + +### Backup + +```bash +# Backup volumes +docker run --rm -v provisioning_orchestrator-data:/data + -v $(pwd)/backup:/backup + alpine tar czf /backup/orchestrator-data.tar.gz -C /data . + +# Backup PostgreSQL +docker exec provisioning-postgres pg_dumpall -U provisioning > backup/postgres-backup.sql +``` + +### Restore + +```bash +# Restore volume +docker run --rm -v provisioning_orchestrator-data:/data + -v $(pwd)/backup:/backup + alpine tar xzf /backup/orchestrator-data.tar.gz -C /data + +# Restore PostgreSQL +docker exec -i provisioning-postgres psql -U provisioning < backup/postgres-backup.sql +``` + +--- + +## Maintenance + +### Updates + +```bash +# Pull latest images +docker-compose pull + +# Recreate containers +docker-compose up -d --force-recreate + +# Remove old images +docker image prune +``` + +### Monitoring + +- **Prometheus**: +- **Grafana**: +- **Logs**: `docker-compose logs -f` + +### Health Checks + +```bash +# Automated health check +./scripts/health-check.nu + +# Manual checks +curl http://localhost:9090/health +curl http://localhost:8081/health +``` + +--- + +## Next Steps + +- [Production Deployment Guide](./PRODUCTION_DEPLOYMENT.md) +- [Kubernetes Deployment Guide](./KUBERNETES_DEPLOYMENT.md) +- [Docker Compose Reference](./DOCKER_COMPOSE_REFERENCE.md) +- [Monitoring Setup](./MONITORING_SETUP.md) +- [Security Hardening](./SECURITY_HARDENING.md) + +--- + +**Documentation Version**: 1.0.0 +**Last Updated**: 2025-10-06 +**Maintained By**: Platform Team \ No newline at end of file diff --git a/docs/deployment/guide.md b/docs/deployment/guide.md index 4e8ef1e..5db97cf 100644 --- a/docs/deployment/guide.md +++ b/docs/deployment/guide.md @@ -1 +1,468 @@ -# Provisioning Platform - Deployment Guide\n\n**Last Updated**: 2025-10-07\n**Platform**: macOS (Apple Silicon / Intel) + OrbStack/Docker\n\n---\n\n## ✅ Fixed: Docker Builds\n\nDocker builds have been **fixed** to properly handle the Rust workspace structure. Both deployment methods (Native and Docker) are now fully \nsupported.\n\n**Note**: Docker builds use Rust nightly to support edition2024 (required by async-graphql 7.x from surrealdb).\nRocksDB has been replaced with SurrealDB in-memory backend (kv-mem) to simplify Docker builds (no libclang requirement).\n\n---\n\n## 📦 Quick Start\n\n### Prerequisites\n\n**For Native Deployment:**\n\n- Rust 1.75+: `brew install rust`\n- Nushell 0.107+: `brew install nushell`\n\n**For Docker Deployment:**\n\n- OrbStack (recommended): \n- Or Docker Desktop: `brew install --cask docker`\n\n---\n\n## 🚀 Deployment Methods\n\n### Method 1: Native Execution (Recommended for Development)\n\n**Fastest startup, easiest debugging, direct access to logs**\n\n```{$detected_lang}\ncd provisioning/platform/scripts\n\n# 1. Build all services\nnu run-native.nu build\n\n# 2. Start all services in background\nnu run-native.nu start-all --background\n\n# 3. Check status\nnu run-native.nu status\n\n# 4. View logs\nnu run-native.nu logs orchestrator --follow\n\n# 5. Stop all\nnu run-native.nu stop-all\n```\n\n**Services will run on:**\n\n- Orchestrator: \n- Control Center: \n\n**Data stored in:**\n\n- `~/.provisioning-platform/data/`\n- `~/.provisioning-platform/logs/`\n\n---\n\n### Method 2: Docker Execution (Recommended for Production-like Testing)\n\n**Isolated environments, easy cleanup, supports all deployment modes**\n\n```{$detected_lang}\ncd provisioning/platform/scripts\n\n# 1. Build Docker images (Solo mode)\nnu run-docker.nu build solo\n\n# 2. Start services in background\nnu run-docker.nu start solo --detach\n\n# 3. Check status\nnu run-docker.nu status\n\n# 4. View logs\nnu run-docker.nu logs orchestrator --follow\n\n# 5. Stop all\nnu run-docker.nu stop\n```\n\n**Deployment Modes:**\n\n- `solo` - 2 CPU / 4GB RAM (dev/test)\n- `multiuser` - 4 CPU / 8GB RAM (team)\n- `cicd` - 8 CPU / 16GB RAM (automation)\n- `enterprise` - 16 CPU / 32GB RAM (production + KMS)\n\n---\n\n## 📋 Complete Command Reference\n\n### Native Execution (`run-native.nu`)\n\n| Command | Description |\n| --------- | ------------- |\n| `build` | Build all services |\n| `start ` | Start orchestrator or control_center |\n| `start-all` | Start all services |\n| `stop ` | Stop a specific service |\n| `stop-all` | Stop all services |\n| `status` | Show service status |\n| `logs ` | Show logs (add `--follow`) |\n| `health` | Check service health |\n\n**Examples:**\n\n```{$detected_lang}\nnu run-native.nu build\nnu run-native.nu start orchestrator --background\nnu run-native.nu start control_center --background\nnu run-native.nu logs orchestrator --follow\nnu run-native.nu health\nnu run-native.nu stop-all\n```\n\n---\n\n### Docker Execution (`run-docker.nu`)\n\n| Command | Description |\n| --------- | ------------- |\n| `build [mode]` | Build Docker images |\n| `start [mode]` | Start services (add `--detach`) |\n| `stop` | Stop all services (add `--volumes` to delete data) |\n| `restart [mode]` | Restart services |\n| `status` | Show container status |\n| `logs ` | Show logs (add `--follow`) |\n| `exec ` | Execute command in container |\n| `stats` | Show resource usage |\n| `health` | Check service health |\n| `config [mode]` | Show docker-compose config |\n| `clean` | Remove containers (add `--all` for images too) |\n\n**Examples:**\n\n```{$detected_lang}\n# Solo mode (fastest)\nnu run-docker.nu build solo\nnu run-docker.nu start solo --detach\n\n# Enterprise mode (with KMS)\nnu run-docker.nu build enterprise\nnu run-docker.nu start enterprise --detach\n\n# Operations\nnu run-docker.nu status\nnu run-docker.nu logs control-center --follow\nnu run-docker.nu exec orchestrator bash\nnu run-docker.nu stats\nnu run-docker.nu stop\n```\n\n---\n\n## 🗄️ Database Information\n\n### Control-Center Database\n\n**Type**: SurrealDB with in-memory backend (kv-mem)\n**Location**: In-memory (data persisted during container/process lifetime)\n**Production Alternative**: SurrealDB with remote WebSocket connection for persistent storage\n\n**No separate database server required** - SurrealDB in-memory backend is embedded in the control-center process.\n\n### Orchestrator Storage\n\n**Type**: Filesystem queue (default)\n**Location**:\n\n- Native: `~/.provisioning-platform/data/orchestrator/queue.rkvs`\n- Docker: `/data/queue.rkvs` (inside container)\n\n**Production Option**: Switch to SurrealDB via config for distributed deployments.\n\n---\n\n## ⚙️ Configuration Loading\n\nServices load configuration in this order (priority: low → high):\n\n1. **System Defaults** - `provisioning/config/config.defaults.toml`\n2. **Service Defaults** - `provisioning/platform/{service}/config.defaults.toml`\n3. **Workspace Config** - `workspace/{name}/config/provisioning.yaml`\n4. **User Config** - `~/Library/Application Support/provisioning/user_config.yaml`\n5. **Environment Variables** - `CONTROL_CENTER_*`, `ORCHESTRATOR_*`\n6. **Runtime Overrides** - `--config` flag\n\n**See full documentation**: `docs/architecture/DATABASE_AND_CONFIG_ARCHITECTURE.md`\n\n---\n\n## 🐛 Troubleshooting\n\n### Native Deployment Issues\n\n**Build fails:**\n\n```{$detected_lang}\n# Clean and rebuild\ncd provisioning/platform\ncargo clean\ncargo build --release\n```\n\n**Port already in use:**\n\n```{$detected_lang}\n# Check what's using the port\nlsof -i :8080\nlsof -i :8081\n\n# Kill the process or use different ports via environment variables\nexport ORCHESTRATOR_SERVER_PORT=8090\nexport CONTROL_CENTER_SERVER_PORT=8091\n```\n\n**Service won't start:**\n\n```{$detected_lang}\n# Check logs for errors\nnu run-native.nu logs orchestrator\n\n# Run in foreground to see output\nnu run-native.nu start orchestrator\n```\n\n---\n\n### Docker Deployment Issues\n\n**Build fails with workspace errors:**\n\n- **Fixed!** Dockerfiles now properly handle workspace structure\n- If still failing: `nu run-docker.nu build solo --no-cache`\n\n**Containers won't start:**\n\n```{$detected_lang}\n# Check container logs\nnu run-docker.nu logs orchestrator\n\n# Check Docker daemon\ndocker ps\ndocker info\n\n# Restart Docker/OrbStack\n```\n\n**Port conflicts:**\n\n```{$detected_lang}\n# Check what's using ports\nlsof -i :8080\nlsof -i :8081\n\n# Stop conflicting services or modify docker-compose.yaml ports\n```\n\n**Out of resources:**\n\n```{$detected_lang}\n# Check current usage\nnu run-docker.nu stats\n\n# Clean up unused containers/images\ndocker system prune -a\n\n# Or use the script\nnu run-docker.nu clean --all\n```\n\n---\n\n## 🔐 KMS Integration (Enterprise Mode)\n\nEnterprise mode includes Cosmian KMS for production-grade secret management.\n\n**Start with KMS:**\n\n```{$detected_lang}\nnu run-docker.nu build enterprise\nnu run-docker.nu start enterprise --detach\n```\n\n**Access KMS:**\n\n- KMS API: \n- KMS Health: \n\n**KMS Features:**\n\n- SSL certificate lifecycle management\n- SSH private key rotation\n- Cloud credential auto-refresh\n- Audit trails\n- Automatic key rotation\n\n**See full KMS documentation**: `provisioning/platform/control-center/src/kms/README.md`\n\n---\n\n## 📊 Monitoring\n\n### Health Checks\n\n**Native:**\n\n```{$detected_lang}\nnu run-native.nu health\n```\n\n**Docker:**\n\n```{$detected_lang}\nnu run-docker.nu health\n```\n\n**Manual:**\n\n```{$detected_lang}\ncurl http://localhost:8080/health # Orchestrator\ncurl http://localhost:8081/health # Control Center\ncurl http://localhost:9998/health # KMS (enterprise only)\n```\n\n### Resource Usage\n\n**Docker:**\n\n```{$detected_lang}\nnu run-docker.nu stats\n```\n\n**Native:**\n\n```{$detected_lang}\nps aux | grep -E "provisioning-orchestrator|control-center"\ntop -pid \n```\n\n---\n\n## 🧪 Testing Both Methods\n\n### Test Native Deployment\n\n```{$detected_lang}\ncd provisioning/platform/scripts\n\n# 1. Build\nnu run-native.nu build\n\n# 2. Start services\nnu run-native.nu start-all --background\n\n# 3. Verify\nnu run-native.nu status\nnu run-native.nu health\n\n# 4. Test API\ncurl http://localhost:8080/health\ncurl http://localhost:8081/health\n\n# 5. Clean up\nnu run-native.nu stop-all\n```\n\n### Test Docker Deployment\n\n```{$detected_lang}\ncd provisioning/platform/scripts\n\n# 1. Build\nnu run-docker.nu build solo\n\n# 2. Start services\nnu run-docker.nu start solo --detach\n\n# 3. Verify\nnu run-docker.nu status\nnu run-docker.nu health\n\n# 4. Test API\ncurl http://localhost:8080/health\ncurl http://localhost:8081/health\n\n# 5. Clean up\nnu run-docker.nu stop --volumes\n```\n\n---\n\n## 🎯 Best Practices\n\n### Development Workflow\n\n1. **Use Native for Active Development**\n - Faster iteration (no Docker rebuild)\n - Direct log access\n - Easy debugging with IDE\n\n2. **Use Docker for Integration Testing**\n - Test deployment configurations\n - Verify Docker builds\n - Simulate production environment\n\n### Production Deployment\n\n1. **Use Docker/Kubernetes**\n - Isolated environments\n - Easy scaling\n - Standard deployment\n\n2. **Use Enterprise Mode**\n - KMS for secret management\n - Full monitoring stack\n - High availability\n\n---\n\n## 📚 Related Documentation\n\n- **Database Architecture**: `docs/architecture/DATABASE_AND_CONFIG_ARCHITECTURE.md`\n- **KMS Integration**: `provisioning/platform/control-center/src/kms/README.md`\n- **Configuration System**: `.claude/features/configuration-system.md`\n- **Workspace Switching**: `.claude/features/workspace-switching.md`\n- **Orchestrator Architecture**: `.claude/features/orchestrator-architecture.md`\n\n---\n\n## ✅ Summary\n\n### Native Execution\n\n- ✅ **Fixed**: Workspace builds work correctly\n- ✅ **Fast**: No container overhead\n- ✅ **Simple**: Direct binary execution\n- ✅ **Best for**: Development, debugging\n\n### Docker Execution\n\n- ✅ **Fixed**: Dockerfiles now workspace-aware\n- ✅ **Isolated**: Clean environments\n- ✅ **Flexible**: Multiple deployment modes\n- ✅ **Best for**: Testing, production-like deployments\n\n**Both methods fully supported and tested!**\n\n---\n\n**Quick Links:**\n\n- Native Script: `provisioning/platform/scripts/run-native.nu`\n- Docker Script: `provisioning/platform/scripts/run-docker.nu`\n- Docker Files: `provisioning/platform/docker-compose.yaml` + mode-specific overrides +# Provisioning Platform - Deployment Guide + +**Last Updated**: 2025-10-07 +**Platform**: macOS (Apple Silicon / Intel) + OrbStack/Docker + +--- + +## ✅ Fixed: Docker Builds + +Docker builds have been **fixed** to properly handle the Rust workspace structure. Both deployment methods (Native and Docker) are now fully +supported. + +**Note**: Docker builds use Rust nightly to support edition2024 (required by async-graphql 7.x from surrealdb). +RocksDB has been replaced with SurrealDB in-memory backend (kv-mem) to simplify Docker builds (no libclang requirement). + +--- + +## 📦 Quick Start + +### Prerequisites + +**For Native Deployment:** + +- Rust 1.75+: `brew install rust` +- Nushell 0.107+: `brew install nushell` + +**For Docker Deployment:** + +- OrbStack (recommended): +- Or Docker Desktop: `brew install --cask docker` + +--- + +## 🚀 Deployment Methods + +### Method 1: Native Execution (Recommended for Development) + +**Fastest startup, easiest debugging, direct access to logs** + +```bash +cd provisioning/platform/scripts + +# 1. Build all services +nu run-native.nu build + +# 2. Start all services in background +nu run-native.nu start-all --background + +# 3. Check status +nu run-native.nu status + +# 4. View logs +nu run-native.nu logs orchestrator --follow + +# 5. Stop all +nu run-native.nu stop-all +``` + +**Services will run on:** + +- Orchestrator: +- Control Center: + +**Data stored in:** + +- `~/.provisioning-platform/data/` +- `~/.provisioning-platform/logs/` + +--- + +### Method 2: Docker Execution (Recommended for Production-like Testing) + +**Isolated environments, easy cleanup, supports all deployment modes** + +```bash +cd provisioning/platform/scripts + +# 1. Build Docker images (Solo mode) +nu run-docker.nu build solo + +# 2. Start services in background +nu run-docker.nu start solo --detach + +# 3. Check status +nu run-docker.nu status + +# 4. View logs +nu run-docker.nu logs orchestrator --follow + +# 5. Stop all +nu run-docker.nu stop +``` + +**Deployment Modes:** + +- `solo` - 2 CPU / 4GB RAM (dev/test) +- `multiuser` - 4 CPU / 8GB RAM (team) +- `cicd` - 8 CPU / 16GB RAM (automation) +- `enterprise` - 16 CPU / 32GB RAM (production + KMS) + +--- + +## 📋 Complete Command Reference + +### Native Execution (`run-native.nu`) + +| Command | Description | +| --------- | ------------- | +| `build` | Build all services | +| `start ` | Start orchestrator or control_center | +| `start-all` | Start all services | +| `stop ` | Stop a specific service | +| `stop-all` | Stop all services | +| `status` | Show service status | +| `logs ` | Show logs (add `--follow`) | +| `health` | Check service health | + +**Examples:** + +```nushell +nu run-native.nu build +nu run-native.nu start orchestrator --background +nu run-native.nu start control_center --background +nu run-native.nu logs orchestrator --follow +nu run-native.nu health +nu run-native.nu stop-all +``` + +--- + +### Docker Execution (`run-docker.nu`) + +| Command | Description | +| --------- | ------------- | +| `build [mode]` | Build Docker images | +| `start [mode]` | Start services (add `--detach`) | +| `stop` | Stop all services (add `--volumes` to delete data) | +| `restart [mode]` | Restart services | +| `status` | Show container status | +| `logs ` | Show logs (add `--follow`) | +| `exec ` | Execute command in container | +| `stats` | Show resource usage | +| `health` | Check service health | +| `config [mode]` | Show docker-compose config | +| `clean` | Remove containers (add `--all` for images too) | + +**Examples:** + +```bash +# Solo mode (fastest) +nu run-docker.nu build solo +nu run-docker.nu start solo --detach + +# Enterprise mode (with KMS) +nu run-docker.nu build enterprise +nu run-docker.nu start enterprise --detach + +# Operations +nu run-docker.nu status +nu run-docker.nu logs control-center --follow +nu run-docker.nu exec orchestrator bash +nu run-docker.nu stats +nu run-docker.nu stop +``` + +--- + +## 🗄️ Database Information + +### Control-Center Database + +**Type**: SurrealDB with in-memory backend (kv-mem) +**Location**: In-memory (data persisted during container/process lifetime) +**Production Alternative**: SurrealDB with remote WebSocket connection for persistent storage + +**No separate database server required** - SurrealDB in-memory backend is embedded in the control-center process. + +### Orchestrator Storage + +**Type**: Filesystem queue (default) +**Location**: + +- Native: `~/.provisioning-platform/data/orchestrator/queue.rkvs` +- Docker: `/data/queue.rkvs` (inside container) + +**Production Option**: Switch to SurrealDB via config for distributed deployments. + +--- + +## ⚙️ Configuration Loading + +Services load configuration in this order (priority: low → high): + +1. **System Defaults** - `provisioning/config/config.defaults.toml` +2. **Service Defaults** - `provisioning/platform/{service}/config.defaults.toml` +3. **Workspace Config** - `workspace/{name}/config/provisioning.yaml` +4. **User Config** - `~/Library/Application Support/provisioning/user_config.yaml` +5. **Environment Variables** - `CONTROL_CENTER_*`, `ORCHESTRATOR_*` +6. **Runtime Overrides** - `--config` flag + +**See full documentation**: `docs/architecture/DATABASE_AND_CONFIG_ARCHITECTURE.md` + +--- + +## 🐛 Troubleshooting + +### Native Deployment Issues + +**Build fails:** + +```bash +# Clean and rebuild +cd provisioning/platform +cargo clean +cargo build --release +``` + +**Port already in use:** + +```bash +# Check what's using the port +lsof -i :8080 +lsof -i :8081 + +# Kill the process or use different ports via environment variables +export ORCHESTRATOR_SERVER_PORT=8090 +export CONTROL_CENTER_SERVER_PORT=8091 +``` + +**Service won't start:** + +```bash +# Check logs for errors +nu run-native.nu logs orchestrator + +# Run in foreground to see output +nu run-native.nu start orchestrator +``` + +--- + +### Docker Deployment Issues + +**Build fails with workspace errors:** + +- **Fixed!** Dockerfiles now properly handle workspace structure +- If still failing: `nu run-docker.nu build solo --no-cache` + +**Containers won't start:** + +```bash +# Check container logs +nu run-docker.nu logs orchestrator + +# Check Docker daemon +docker ps +docker info + +# Restart Docker/OrbStack +``` + +**Port conflicts:** + +```bash +# Check what's using ports +lsof -i :8080 +lsof -i :8081 + +# Stop conflicting services or modify docker-compose.yaml ports +``` + +**Out of resources:** + +```bash +# Check current usage +nu run-docker.nu stats + +# Clean up unused containers/images +docker system prune -a + +# Or use the script +nu run-docker.nu clean --all +``` + +--- + +## 🔐 KMS Integration (Enterprise Mode) + +Enterprise mode includes Cosmian KMS for production-grade secret management. + +**Start with KMS:** + +```nushell +nu run-docker.nu build enterprise +nu run-docker.nu start enterprise --detach +``` + +**Access KMS:** + +- KMS API: +- KMS Health: + +**KMS Features:** + +- SSL certificate lifecycle management +- SSH private key rotation +- Cloud credential auto-refresh +- Audit trails +- Automatic key rotation + +**See full KMS documentation**: `provisioning/platform/control-center/src/kms/README.md` + +--- + +## 📊 Monitoring + +### Health Checks + +**Native:** + +```nushell +nu run-native.nu health +``` + +**Docker:** + +```nushell +nu run-docker.nu health +``` + +**Manual:** + +```bash +curl http://localhost:8080/health # Orchestrator +curl http://localhost:8081/health # Control Center +curl http://localhost:9998/health # KMS (enterprise only) +``` + +### Resource Usage + +**Docker:** + +```nushell +nu run-docker.nu stats +``` + +**Native:** + +```bash +ps aux | grep -E "provisioning-orchestrator|control-center" +top -pid +``` + +--- + +## 🧪 Testing Both Methods + +### Test Native Deployment + +```bash +cd provisioning/platform/scripts + +# 1. Build +nu run-native.nu build + +# 2. Start services +nu run-native.nu start-all --background + +# 3. Verify +nu run-native.nu status +nu run-native.nu health + +# 4. Test API +curl http://localhost:8080/health +curl http://localhost:8081/health + +# 5. Clean up +nu run-native.nu stop-all +``` + +### Test Docker Deployment + +```bash +cd provisioning/platform/scripts + +# 1. Build +nu run-docker.nu build solo + +# 2. Start services +nu run-docker.nu start solo --detach + +# 3. Verify +nu run-docker.nu status +nu run-docker.nu health + +# 4. Test API +curl http://localhost:8080/health +curl http://localhost:8081/health + +# 5. Clean up +nu run-docker.nu stop --volumes +``` + +--- + +## 🎯 Best Practices + +### Development Workflow + +1. **Use Native for Active Development** + - Faster iteration (no Docker rebuild) + - Direct log access + - Easy debugging with IDE + +2. **Use Docker for Integration Testing** + - Test deployment configurations + - Verify Docker builds + - Simulate production environment + +### Production Deployment + +1. **Use Docker/Kubernetes** + - Isolated environments + - Easy scaling + - Standard deployment + +2. **Use Enterprise Mode** + - KMS for secret management + - Full monitoring stack + - High availability + +--- + +## 📚 Related Documentation + +- **Database Architecture**: `docs/architecture/DATABASE_AND_CONFIG_ARCHITECTURE.md` +- **KMS Integration**: `provisioning/platform/control-center/src/kms/README.md` +- **Configuration System**: `.claude/features/configuration-system.md` +- **Workspace Switching**: `.claude/features/workspace-switching.md` +- **Orchestrator Architecture**: `.claude/features/orchestrator-architecture.md` + +--- + +## ✅ Summary + +### Native Execution + +- ✅ **Fixed**: Workspace builds work correctly +- ✅ **Fast**: No container overhead +- ✅ **Simple**: Direct binary execution +- ✅ **Best for**: Development, debugging + +### Docker Execution + +- ✅ **Fixed**: Dockerfiles now workspace-aware +- ✅ **Isolated**: Clean environments +- ✅ **Flexible**: Multiple deployment modes +- ✅ **Best for**: Testing, production-like deployments + +**Both methods fully supported and tested!** + +--- + +**Quick Links:** + +- Native Script: `provisioning/platform/scripts/run-native.nu` +- Docker Script: `provisioning/platform/scripts/run-docker.nu` +- Docker Files: `provisioning/platform/docker-compose.yaml` + mode-specific overrides \ No newline at end of file diff --git a/docs/deployment/known-issues.md b/docs/deployment/known-issues.md index 6c91c93..c58c2c6 100644 --- a/docs/deployment/known-issues.md +++ b/docs/deployment/known-issues.md @@ -1 +1,97 @@ -# Known Issues - Provisioning Platform\n\n## Control-Center Requires Rust Nightly (Edition 2024 Dependency)\n\n**Status**: Resolved (using nightly)\n**Severity**: Low\n**Affects**: Docker deployment only\n**Date Reported**: 2025-10-07\n**Date Resolved**: 2025-10-07\n\n### Issue\n\nControl-center Docker builds fail with the following error:\n\n```{$detected_lang}\nfeature 'edition2024' is required\nthis Cargo does not support nightly features, but if you\nswitch to nightly channel you can add\n`cargo-features = ["edition2024"]` to enable this feature\n```\n\n### Root Cause\n\nDependency chain:\n\n```{$detected_lang}\ncontrol-center → surrealdb 2.3.10 → surrealdb-core 2.3.10 → async-graphql 7.0.17\n```\n\nThe `async-graphql-value` crate v7.0.17 requires Rust edition 2024, which is not yet stable in Rust 1.82.\nEdition 2024 is currently only available in Rust nightly builds.\n\n### Resolution\n\n**Updated Dockerfiles to use Rust nightly** (2025-10-07):\n\nBoth `orchestrator/Dockerfile` and `control-center/Dockerfile` now use:\n\n```{$detected_lang}\nFROM rustlang/rust:nightly-bookworm AS builder\n```\n\nThis provides edition2024 support required by the surrealdb dependency chain.\n\n### Production Considerations\n\n**Rust Nightly Stability**:\n\n- Nightly builds are generally stable for compilation\n- The compiled binaries are production-ready\n- Runtime behavior is not affected by nightly vs stable compilation\n- Consider pinning to a specific nightly date for reproducible builds\n\n**Alternative**: Use native deployment with stable Rust if nightly is a concern:\n\n```{$detected_lang}\ncd provisioning/platform/scripts\nnu run-native.nu build\nnu run-native.nu start-all --background\n```\n\n### Timeline\n\n- **Rust 1.85** (estimated Feb 2025): Expected to stabilize edition 2024\n- **SurrealDB 3.x**: May drop async-graphql dependency\n\n### Tracking\n\n- Rust Edition 2024 RFC: \n- SurrealDB Issue: \n- async-graphql Issue: \n\n### Related Files\n\n- `provisioning/platform/control-center/Dockerfile`\n- `provisioning/platform/Cargo.toml` (workspace dependencies)\n- `provisioning/platform/control-center/Cargo.toml`\n\n---\n\n## RocksDB Build Takes Long Time\n\n**Status**: Known Limitation\n**Severity**: Low\n**Affects**: All builds\n\n### Issue\n\nRocksDB compilation takes 30-60 seconds during builds.\n\n### Workaround\n\nUse cached Docker layers or native builds with incremental compilation.\n\n---\n\n**Last Updated**: 2025-10-07 +# Known Issues - Provisioning Platform + +## Control-Center Requires Rust Nightly (Edition 2024 Dependency) + +**Status**: Resolved (using nightly) +**Severity**: Low +**Affects**: Docker deployment only +**Date Reported**: 2025-10-07 +**Date Resolved**: 2025-10-07 + +### Issue + +Control-center Docker builds fail with the following error: + +```bash +feature 'edition2024' is required +this Cargo does not support nightly features, but if you +switch to nightly channel you can add +`cargo-features = ["edition2024"]` to enable this feature +``` + +### Root Cause + +Dependency chain: + +```bash +control-center → surrealdb 2.3.10 → surrealdb-core 2.3.10 → async-graphql 7.0.17 +``` + +The `async-graphql-value` crate v7.0.17 requires Rust edition 2024, which is not yet stable in Rust 1.82. +Edition 2024 is currently only available in Rust nightly builds. + +### Resolution + +**Updated Dockerfiles to use Rust nightly** (2025-10-07): + +Both `orchestrator/Dockerfile` and `control-center/Dockerfile` now use: + +```bash +FROM rustlang/rust:nightly-bookworm AS builder +``` + +This provides edition2024 support required by the surrealdb dependency chain. + +### Production Considerations + +**Rust Nightly Stability**: + +- Nightly builds are generally stable for compilation +- The compiled binaries are production-ready +- Runtime behavior is not affected by nightly vs stable compilation +- Consider pinning to a specific nightly date for reproducible builds + +**Alternative**: Use native deployment with stable Rust if nightly is a concern: + +```rust +cd provisioning/platform/scripts +nu run-native.nu build +nu run-native.nu start-all --background +``` + +### Timeline + +- **Rust 1.85** (estimated Feb 2025): Expected to stabilize edition 2024 +- **SurrealDB 3.x**: May drop async-graphql dependency + +### Tracking + +- Rust Edition 2024 RFC: +- SurrealDB Issue: +- async-graphql Issue: + +### Related Files + +- `provisioning/platform/control-center/Dockerfile` +- `provisioning/platform/Cargo.toml` (workspace dependencies) +- `provisioning/platform/control-center/Cargo.toml` + +--- + +## RocksDB Build Takes Long Time + +**Status**: Known Limitation +**Severity**: Low +**Affects**: All builds + +### Issue + +RocksDB compilation takes 30-60 seconds during builds. + +### Workaround + +Use cached Docker layers or native builds with incremental compilation. + +--- + +**Last Updated**: 2025-10-07 \ No newline at end of file diff --git a/docs/guides/quick-start.md b/docs/guides/quick-start.md index c377b62..268d370 100644 --- a/docs/guides/quick-start.md +++ b/docs/guides/quick-start.md @@ -1 +1,282 @@ -# Provisioning Platform - Quick Start\n\nFast deployment guide for all modes.\n\n---\n\n## Prerequisites\n\n```{$detected_lang}\n# Verify Docker is installed and running\ndocker --version # 20.10+\ndocker-compose --version # 2.0+\ndocker ps # Should work without errors\n```\n\n---\n\n## 1. Solo Mode (Local Development)\n\n**Services**: Orchestrator, Control Center, CoreDNS, OCI Registry, Extension Registry\n\n**Resources**: 2 CPU cores, 4GB RAM, 20GB disk\n\n```{$detected_lang}\ncd /Users/Akasha/project-provisioning/provisioning/platform\n\n# Generate secrets\n./scripts/generate-secrets.nu\n\n# Deploy\n./scripts/deploy-platform.nu --mode solo\n\n# Verify\n./scripts/health-check.nu\n\n# Access\nopen http://localhost:8080 # Orchestrator\nopen http://localhost:8081 # Control Center\n```\n\n**Stop**:\n\n```{$detected_lang}\ndocker-compose down\n```\n\n---\n\n## 2. Multi-User Mode (Team Collaboration)\n\n**Services**: Solo + Gitea, PostgreSQL\n\n**Resources**: 4 CPU cores, 8GB RAM, 50GB disk\n\n```{$detected_lang}\ncd /Users/Akasha/project-provisioning/provisioning/platform\n\n# Generate secrets\n./scripts/generate-secrets.nu\n\n# Deploy\n./scripts/deploy-platform.nu --mode multi-user\n\n# Verify\n./scripts/health-check.nu\n\n# Access\nopen http://localhost:3000 # Gitea\nopen http://localhost:8081 # Control Center\n```\n\n**Configure Gitea**:\n\n1. Visit \n2. Complete initial setup wizard\n3. Create admin account\n\n---\n\n## 3. CI/CD Mode (Automated Pipelines)\n\n**Services**: Multi-User + API Server, Jenkins (optional), GitLab Runner (optional)\n\n**Resources**: 8 CPU cores, 16GB RAM, 100GB disk\n\n```{$detected_lang}\ncd /Users/Akasha/project-provisioning/provisioning/platform\n\n# Generate secrets\n./scripts/generate-secrets.nu\n\n# Deploy\n./scripts/deploy-platform.nu --mode cicd --build\n\n# Verify\n./scripts/health-check.nu\n\n# Access\nopen http://localhost:8083 # API Server\n```\n\n---\n\n## 4. Enterprise Mode (Production)\n\n**Services**: Full stack (15+ services)\n\n**Resources**: 16 CPU cores, 32GB RAM, 500GB disk\n\n```{$detected_lang}\ncd /Users/Akasha/project-provisioning/provisioning/platform\n\n# Generate production secrets\n./scripts/generate-secrets.nu --output .env.production\n\n# Review and customize\nnano .env.production\n\n# Deploy with build\n./scripts/deploy-platform.nu --mode enterprise \\n --env-file .env.production \\n --build \\n --wait 600\n\n# Verify\n./scripts/health-check.nu\n\n# Access\nopen http://localhost:3001 # Grafana (admin / password from .env)\nopen http://localhost:9090 # Prometheus\nopen http://localhost:5601 # Kibana\n```\n\n---\n\n## Common Commands\n\n### View Logs\n\n```{$detected_lang}\ndocker-compose logs -f\ndocker-compose logs -f orchestrator\ndocker-compose logs --tail=100 orchestrator\n```\n\n### Restart Services\n\n```{$detected_lang}\ndocker-compose restart orchestrator\ndocker-compose restart\n```\n\n### Update Platform\n\n```{$detected_lang}\ndocker-compose pull\n./scripts/deploy-platform.nu --mode --pull\n```\n\n### Stop Platform\n\n```{$detected_lang}\ndocker-compose down\n```\n\n### Clean Everything (WARNING: data loss)\n\n```{$detected_lang}\ndocker-compose down --volumes\n```\n\n---\n\n## Systemd (Linux Production)\n\n```{$detected_lang}\n# Install services\ncd systemd\nsudo ./install-services.sh\n\n# Enable and start\nsudo systemctl enable --now provisioning-platform\n\n# Check status\nsudo systemctl status provisioning-platform\n\n# View logs\nsudo journalctl -u provisioning-platform -f\n\n# Restart\nsudo systemctl restart provisioning-platform\n\n# Stop\nsudo systemctl stop provisioning-platform\n```\n\n---\n\n## Troubleshooting\n\n### Services not starting\n\n```{$detected_lang}\n# Check Docker\nsystemctl status docker\n\n# Check logs\ndocker-compose logs orchestrator\n\n# Check resources\ndocker stats\n```\n\n### Port conflicts\n\n```{$detected_lang}\n# Find what's using port\nlsof -i :8080\n\n# Change port in .env\nnano .env\n# Set ORCHESTRATOR_PORT=9080\n\n# Restart\ndocker-compose down && docker-compose up -d\n```\n\n### Health checks failing\n\n```{$detected_lang}\n# Check individual service\ncurl http://localhost:8080/health\n\n# Wait longer\n./scripts/deploy-platform.nu --wait 600\n\n# Check networks\ndocker network inspect provisioning-net\n```\n\n---\n\n## Access URLs\n\n### Solo Mode\n\n- Orchestrator: \n- Control Center: \n- OCI Registry: \n\n### Multi-User Mode\n\n- Gitea: \n- PostgreSQL: localhost:5432\n\n### CI/CD Mode\n\n- API Server: \n\n### Enterprise Mode\n\n- Prometheus: \n- Grafana: \n- Kibana: \n- Nginx: \n\n---\n\n## Next Steps\n\n- **Full Guide**: See `docs/deployment/deployment-guide.md`\n- **Configuration**: Edit `.env` file for customization\n- **Monitoring**: Access Grafana dashboards (enterprise mode)\n- **API**: Use API Server for automation (CI/CD mode)\n\n---\n\n**Need Help?**\n\n- Health Check: `./scripts/health-check.nu`\n- Logs: `docker-compose logs -f`\n- Documentation: `docs/deployment/` +# Provisioning Platform - Quick Start + +Fast deployment guide for all modes. + +--- + +## Prerequisites + +```bash +# Verify Docker is installed and running +docker --version # 20.10+ +docker-compose --version # 2.0+ +docker ps # Should work without errors +``` + +--- + +## 1. Solo Mode (Local Development) + +**Services**: Orchestrator, Control Center, CoreDNS, OCI Registry, Extension Registry + +**Resources**: 2 CPU cores, 4GB RAM, 20GB disk + +```bash +cd /Users/Akasha/project-provisioning/provisioning/platform + +# Generate secrets +./scripts/generate-secrets.nu + +# Deploy +./scripts/deploy-platform.nu --mode solo + +# Verify +./scripts/health-check.nu + +# Access +open http://localhost:8080 # Orchestrator +open http://localhost:8081 # Control Center +``` + +**Stop**: + +```bash +docker-compose down +``` + +--- + +## 2. Multi-User Mode (Team Collaboration) + +**Services**: Solo + Gitea, PostgreSQL + +**Resources**: 4 CPU cores, 8GB RAM, 50GB disk + +```bash +cd /Users/Akasha/project-provisioning/provisioning/platform + +# Generate secrets +./scripts/generate-secrets.nu + +# Deploy +./scripts/deploy-platform.nu --mode multi-user + +# Verify +./scripts/health-check.nu + +# Access +open http://localhost:3000 # Gitea +open http://localhost:8081 # Control Center +``` + +**Configure Gitea**: + +1. Visit +2. Complete initial setup wizard +3. Create admin account + +--- + +## 3. CI/CD Mode (Automated Pipelines) + +**Services**: Multi-User + API Server, Jenkins (optional), GitLab Runner (optional) + +**Resources**: 8 CPU cores, 16GB RAM, 100GB disk + +```bash +cd /Users/Akasha/project-provisioning/provisioning/platform + +# Generate secrets +./scripts/generate-secrets.nu + +# Deploy +./scripts/deploy-platform.nu --mode cicd --build + +# Verify +./scripts/health-check.nu + +# Access +open http://localhost:8083 # API Server +``` + +--- + +## 4. Enterprise Mode (Production) + +**Services**: Full stack (15+ services) + +**Resources**: 16 CPU cores, 32GB RAM, 500GB disk + +```bash +cd /Users/Akasha/project-provisioning/provisioning/platform + +# Generate production secrets +./scripts/generate-secrets.nu --output .env.production + +# Review and customize +nano .env.production + +# Deploy with build +./scripts/deploy-platform.nu --mode enterprise + --env-file .env.production + --build + --wait 600 + +# Verify +./scripts/health-check.nu + +# Access +open http://localhost:3001 # Grafana (admin / password from .env) +open http://localhost:9090 # Prometheus +open http://localhost:5601 # Kibana +``` + +--- + +## Common Commands + +### View Logs + +```bash +docker-compose logs -f +docker-compose logs -f orchestrator +docker-compose logs --tail=100 orchestrator +``` + +### Restart Services + +```bash +docker-compose restart orchestrator +docker-compose restart +``` + +### Update Platform + +```bash +docker-compose pull +./scripts/deploy-platform.nu --mode --pull +``` + +### Stop Platform + +```bash +docker-compose down +``` + +### Clean Everything (WARNING: data loss) + +```bash +docker-compose down --volumes +``` + +--- + +## Systemd (Linux Production) + +```bash +# Install services +cd systemd +sudo ./install-services.sh + +# Enable and start +sudo systemctl enable --now provisioning-platform + +# Check status +sudo systemctl status provisioning-platform + +# View logs +sudo journalctl -u provisioning-platform -f + +# Restart +sudo systemctl restart provisioning-platform + +# Stop +sudo systemctl stop provisioning-platform +``` + +--- + +## Troubleshooting + +### Services not starting + +```bash +# Check Docker +systemctl status docker + +# Check logs +docker-compose logs orchestrator + +# Check resources +docker stats +``` + +### Port conflicts + +```bash +# Find what's using port +lsof -i :8080 + +# Change port in .env +nano .env +# Set ORCHESTRATOR_PORT=9080 + +# Restart +docker-compose down && docker-compose up -d +``` + +### Health checks failing + +```bash +# Check individual service +curl http://localhost:8080/health + +# Wait longer +./scripts/deploy-platform.nu --wait 600 + +# Check networks +docker network inspect provisioning-net +``` + +--- + +## Access URLs + +### Solo Mode + +- Orchestrator: +- Control Center: +- OCI Registry: + +### Multi-User Mode + +- Gitea: +- PostgreSQL: localhost:5432 + +### CI/CD Mode + +- API Server: + +### Enterprise Mode + +- Prometheus: +- Grafana: +- Kibana: +- Nginx: + +--- + +## Next Steps + +- **Full Guide**: See `docs/deployment/deployment-guide.md` +- **Configuration**: Edit `.env` file for customization +- **Monitoring**: Access Grafana dashboards (enterprise mode) +- **API**: Use API Server for automation (CI/CD mode) + +--- + +**Need Help?** + +- Health Check: `./scripts/health-check.nu` +- Logs: `docker-compose logs -f` +- Documentation: `docs/deployment/` \ No newline at end of file