# Metadata-Driven Authentication System - Implementation Guide\n\n**Status**: ✅ Complete and Production-Ready\n**Version**: 1.0.0\n**Last Updated**: 2025-12-10\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Architecture](#architecture)\n3. [Installation](#installation)\n4. [Usage Guide](#usage-guide)\n5. [Migration Path](#migration-path)\n6. [Developer Guide](#developer-guide)\n7. [Testing](#testing)\n8. [Troubleshooting](#troubleshooting)\n\n## Overview\n\nThis guide describes the metadata-driven authentication system implemented over 5 weeks across 14 command handlers and 12 major systems. The system provides:\n\n- **Centralized Metadata**: All command definitions in Nickel with runtime validation\n- **Automatic Auth Checks**: Pre-execution validation before handler logic\n- **Performance Optimization**: 40-100x faster through metadata caching\n- **Flexible Deployment**: Works with orchestrator, batch workflows, and direct CLI\n\n## Architecture\n\n### System Components\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│ User Command │\n└────────────────────────────────┬──────────────────────────────┘\n │\n ┌────────────▼─────────────┐\n │ CLI Dispatcher │\n │ (main_provisioning) │\n └────────────┬─────────────┘\n │\n ┌────────────▼─────────────┐\n │ Metadata Loading │\n │ (cached via traits.nu) │\n └────────────┬─────────────┘\n │\n ┌────────────▼─────────────────────┐\n │ Pre-Execution Validation │\n │ - Auth checks │\n │ - Permission validation │\n │ - Operation type mapping │\n └────────────┬─────────────────────┘\n │\n ┌────────────▼─────────────────────┐\n │ Command Handler Execution │\n │ - infrastructure.nu │\n │ - orchestration.nu │\n │ - workspace.nu │\n └────────────┬─────────────────────┘\n │\n ┌────────────▼─────────────┐\n │ Result/Response │\n └─────────────────────────┘\n```\n\n### Data Flow\n\n1. **User Command** → CLI Dispatcher\n2. **Dispatcher** → Load cached metadata (or parse Nickel)\n3. **Validate** → Check auth, operation type, permissions\n4. **Execute** → Call appropriate handler\n5. **Return** → Result to user\n\n### Metadata Caching\n\n- **Location**: `~/.cache/provisioning/command_metadata.json`\n- **Format**: Serialized JSON (pre-parsed for speed)\n- **TTL**: 1 hour (configurable via `PROVISIONING_METADATA_TTL`)\n- **Invalidation**: Automatic on `main.ncl` modification\n- **Performance**: 40-100x faster than Nickel parsing\n\n## Installation\n\n### Prerequisites\n\n- Nushell 0.109.0+\n- Nickel 1.15.0+\n- SOPS 3.10.2 (for encrypted configs)\n- Age 1.2.1 (for encryption)\n\n### Installation Steps\n\n```\n# 1. Clone or update repository\ngit clone https://github.com/your-org/project-provisioning.git\ncd project-provisioning\n\n# 2. Initialize workspace\n./provisioning/core/cli/provisioning workspace init\n\n# 3. Validate system\n./provisioning/core/cli/provisioning validate config\n\n# 4. Run system checks\n./provisioning/core/cli/provisioning health\n\n# 5. Run test suites\nnu tests/test-fase5-e2e.nu\nnu tests/test-security-audit-day20.nu\nnu tests/test-metadata-cache-benchmark.nu\n```\n\n## Usage Guide\n\n### Basic Commands\n\n```\n# Initialize authentication\nprovisioning login\n\n# Enroll in MFA\nprovisioning mfa totp enroll\n\n# Create infrastructure\nprovisioning server create --name web-01 --plan 1xCPU-2 GB\n\n# Deploy with orchestrator\nprovisioning workflow submit workflows/deployment.ncl --orchestrated\n\n# Batch operations\nprovisioning batch submit workflows/batch-deploy.ncl\n\n# Check without executing\nprovisioning server create --name test --check\n```\n\n### Authentication Flow\n\n```\n# 1. Login (required for production operations)\n$ provisioning login\nUsername: alice@example.com\nPassword: ****\n\n# 2. Optional: Setup MFA\n$ provisioning mfa totp enroll\nScan QR code with authenticator app\nVerify code: 123456\n\n# 3. Use commands (auth checks happen automatically)\n$ provisioning server delete --name old-server --infra production\nAuth check: Check auth for production (delete operation)\nAre you sure? [yes/no] yes\n✓ Server deleted\n\n# 4. All destructive operations require auth\n$ provisioning taskserv delete postgres web-01\nAuth check: Check auth for destructive operation\n✓ Taskserv deleted\n```\n\n### Check Mode (Bypass Auth for Testing)\n\n```\n# Dry-run without auth checks\nprovisioning server create --name test --check\n\n# Output: Shows what would happen, no auth checks\nDry-run mode - no changes will be made\n✓ Would create server: test\n✓ Would deploy taskservs: []\n```\n\n### Non-Interactive CI/CD Mode\n\n```\n# Automated mode - skip confirmations\nprovisioning server create --name web-01 --yes\n\n# Batch operations\nprovisioning batch submit workflows/batch.ncl --yes --check\n\n# With environment variable\nPROVISIONING_NON_INTERACTIVE=1 provisioning server create --name web-02 --yes\n```\n\n## Migration Path\n\n### Phase 1: From Old `input` to Metadata\n\n**Old Pattern** (Before Fase 5):\n\n```\n# Hardcoded auth check\nlet response = (input "Delete server? (yes/no): ")\nif $response != "yes" { exit 1 }\n\n# No metadata - auth unknown\nexport def delete-server [name: string, --yes] {\n if not $yes { ... manual confirmation ... }\n # ... deletion logic ...\n}\n```\n\n**New Pattern** (After Fase 5):\n\n```\n# Metadata header\n# [command]\n# name = "server delete"\n# group = "infrastructure"\n# tags = ["server", "delete", "destructive"]\n# version = "1.0.0"\n\n# Automatic auth check from metadata\nexport def delete-server [name: string, --yes] {\n # Pre-execution check happens in dispatcher\n # Auth enforcement via metadata\n # Operation type: "delete" automatically detected\n # ... deletion logic ...\n}\n```\n\n### Phase 2: Adding Metadata Headers\n\n**For each script that was migrated:**\n\n1. Add metadata header after shebang:\n\n```\n#!/usr/bin/env nu\n# [command]\n# name = "server create"\n# group = "infrastructure"\n# tags = ["server", "create", "interactive"]\n# version = "1.0.0"\n\nexport def create-server [name: string] {\n # Logic here\n}\n```\n\n1. Register in `provisioning/schemas/main.ncl`:\n\n```\nlet server_create = {\n name = "server create",\n domain = "infrastructure",\n description = "Create a new server",\n requirements = {\n interactive = false,\n requires_auth = true,\n auth_type = "jwt",\n side_effect_type = "create",\n min_permission = "write",\n },\n} in\nserver_create\n```\n\n1. Handler integration (happens in dispatcher):\n\n```\n# Dispatcher automatically:\n# 1. Loads metadata for "server create"\n# 2. Validates auth based on requirements\n# 3. Checks permission levels\n# 4. Calls handler if validation passes\n```\n\n### Phase 3: Validating Migration\n\n```\n# Validate metadata headers\nnu utils/validate-metadata-headers.nu\n\n# Find scripts by tag\nnu utils/search-scripts.nu by-tag destructive\n\n# Find all scripts in group\nnu utils/search-scripts.nu by-group infrastructure\n\n# Find scripts with multiple tags\nnu utils/search-scripts.nu by-tags server delete\n\n# List all migrated scripts\nnu utils/search-scripts.nu list\n```\n\n## Developer Guide\n\n### Adding New Commands with Metadata\n\n**Step 1: Create metadata in main.ncl**\n\n```\nlet new_feature_command = {\n name = "feature command",\n domain = "infrastructure",\n description = "My new feature",\n requirements = {\n interactive = false,\n requires_auth = true,\n auth_type = "jwt",\n side_effect_type = "create",\n min_permission = "write",\n },\n} in\nnew_feature_command\n```\n\n**Step 2: Add metadata header to script**\n\n```\n#!/usr/bin/env nu\n# [command]\n# name = "feature command"\n# group = "infrastructure"\n# tags = ["feature", "create"]\n# version = "1.0.0"\n\nexport def feature-command [param: string] {\n # Implementation\n}\n```\n\n**Step 3: Implement handler function**\n\n```\n# Handler registered in dispatcher\nexport def handle-feature-command [\n action: string\n --flags\n]: nothing -> nothing {\n # Dispatcher handles:\n # 1. Metadata validation\n # 2. Auth checks\n # 3. Permission validation\n\n # Your logic here\n}\n```\n\n**Step 4: Test with check mode**\n\n```\n# Dry-run without auth\nprovisioning feature command --check\n\n# Full execution\nprovisioning feature command --yes\n```\n\n### Metadata Field Reference\n\n| Field | Type | Required | Description |\n| ------- | ------ | ---------- | ------------- |\n| name | string | Yes | Command canonical name |\n| domain | string | Yes | Command category (infrastructure, orchestration, etc.) |\n| description | string | Yes | Human-readable description |\n| requires_auth | bool | Yes | Whether auth is required |\n| auth_type | enum | Yes | "none", "jwt", "mfa", "cedar" |\n| side_effect_type | enum | Yes | "none", "create", "update", "delete", "deploy" |\n| min_permission | enum | Yes | "read", "write", "admin", "superadmin" |\n| interactive | bool | No | Whether command requires user input |\n| slow_operation | bool | No | Whether operation takes >60 seconds |\n\n### Standard Tags\n\n**Groups**:\n\n- infrastructure - Server, taskserv, cluster operations\n- orchestration - Workflow, batch operations\n- workspace - Workspace management\n- authentication - Auth, MFA, tokens\n- utilities - Helper commands\n\n**Operations**:\n\n- create, read, update, delete - CRUD operations\n- destructive - Irreversible operations\n- interactive - Requires user input\n\n**Performance**:\n\n- slow - Operation >60 seconds\n- optimizable - Candidate for optimization\n\n### Performance Optimization Patterns\n\n**Pattern 1: For Long Operations**\n\n```\n# Use orchestrator for operations >2 seconds\nif (get-operation-duration "my-operation") > 2000 {\n submit-to-orchestrator $operation\n return "Operation submitted in background"\n}\n```\n\n**Pattern 2: For Batch Operations**\n\n```\n# Use batch workflows for multiple operations\nnu -c "\nuse core/nulib/workflows/batch.nu *\nbatch submit workflows/batch-deploy.ncl --parallel-limit 5\n"\n```\n\n**Pattern 3: For Metadata Overhead**\n\n```\n# Cache hit rate optimization\n# Current: 40-100x faster with warm cache\n# Target: >95% cache hit rate\n# Achieved: Metadata stays in cache for 1 hour (TTL)\n```\n\n## Testing\n\n### Running Tests\n\n```\n# End-to-End Integration Tests\nnu tests/test-fase5-e2e.nu\n\n# Security Audit\nnu tests/test-security-audit-day20.nu\n\n# Performance Benchmarks\nnu tests/test-metadata-cache-benchmark.nu\n\n# Run all tests\nfor test in tests/test-*.nu { nu $test }\n```\n\n### Test Coverage\n\n| Test Suite | Category | Coverage |\n| ----------- | ---------- | ---------- |\n| E2E Tests | Integration | 7 test groups, 40+ checks |\n| Security Audit | Auth | 5 audit categories, 100% pass |\n| Benchmarks | Performance | 6 benchmark categories |\n\n### Expected Results\n\n✅ All tests pass\n✅ No Nushell syntax violations\n✅ Cache hit rate >95%\n✅ Auth enforcement 100%\n✅ Performance baselines met\n\n## Troubleshooting\n\n### Issue: Command not found\n\n**Solution**: Ensure metadata is registered in `main.ncl`\n\n```\n# Check if command is in metadata\ngrep "command_name" provisioning/schemas/main.ncl\n```\n\n### Issue: Auth check failing\n\n**Solution**: Verify user has required permission level\n\n```\n# Check current user permissions\nprovisioning auth whoami\n\n# Check command requirements\nnu -c "\nuse core/nulib/lib_provisioning/commands/traits.nu *\nget-command-metadata 'server create'\n"\n```\n\n### Issue: Slow command execution\n\n**Solution**: Check cache status\n\n```\n# Force cache reload\nrm ~/.cache/provisioning/command_metadata.json\n\n# Check cache hit rate\nnu tests/test-metadata-cache-benchmark.nu\n```\n\n### Issue: Nushell syntax error\n\n**Solution**: Run compliance check\n\n```\n# Validate Nushell compliance\nnu --ide-check 100 \n\n# Check for common issues\ngrep "try {" # Should be empty\ngrep "let mut" # Should be empty\n```\n\n## Performance Characteristics\n\n### Baseline Metrics\n\n| Operation | Cold | Warm | Improvement |\n| ----------- | ------ | ------ | ------------- |\n| Metadata Load | 200 ms | 2-5 ms | 40-100x |\n| Auth Check | <5 ms | <5 ms | Same |\n| Command Dispatch | <10 ms | <10 ms | Same |\n| Total Command | ~210 ms | ~10 ms | 21x |\n\n### Real-World Impact\n\n```\nScenario: 20 sequential commands\n Without cache: 20 × 200 ms = 4 seconds\n With cache: 1 × 200 ms + 19 × 5 ms = 295 ms\n Speedup: ~13.5x faster\n```\n\n## Next Steps\n\n1. **Deploy**: Use installer to deploy to production\n2. **Monitor**: Watch cache hit rates (target >95%)\n3. **Extend**: Add new commands following migration pattern\n4. **Optimize**: Use profiling to identify slow operations\n5. **Maintain**: Run validation scripts regularly\n\n---\n\n**For Support**: See `docs/troubleshooting-guide.md`\n**For Architecture**: See `docs/architecture/`\n**For User Guide**: See `docs/user/AUTHENTICATION_LAYER_GUIDE.md`