8.3 KiB
Design Principles
Core principles guiding Provisioning architecture and development.
1. Workspace-First Design
Principle: Workspaces are the default organizational unit for ALL infrastructure work.
Why:
- Explicit project isolation
- Prevent accidental cross-project modifications
- Independent credential management
- Clear configuration boundaries
- Team collaboration enablement
Application:
- Every workspace has independent state
- Workspace switching is atomic
- Configuration per workspace
- Extensions inherited from platform
Code Example:
# Workspace-enforced workflow
provisioning workspace init my-project
provisioning workspace switch my-project
# This command requires active workspace
provisioning server create --name web-01
Impact: All commands validate active workspace before execution.
2. Type-Safety Mandatory
Principle: ALL configurations MUST be type-safe. Validation is NEVER optional.
Why:
- Catch errors at configuration time
- Prevent runtime failures
- Enable IDE support (LSP)
- Enforce consistency
- Reduce deployment risk
Application:
- Nickel is source of truth (NOT TOML)
- Type contracts on ALL schemas
- Gradual typing not allowed
- Validation in ALL profiles (dev, prod, cicd)
- Static analysis before deployment
Code Example:
# Type-safe infrastructure definition
{
name : String = "server-01"
plan : | [ 'small, 'medium, 'large | ] = 'medium
zone : String = "de-fra1"
backup_enabled : Bool = false
} | ServerContract
Impact: Type errors caught before infrastructure changes.
3. Configuration-Driven, Never Hardcoded
Principle: Configuration is the source of truth. Hardcoded values are forbidden.
Why:
- Enable environment-specific behavior
- Support multiple deployment modes
- Allow runtime reconfiguration
- Audit configuration changes
- Team collaboration
Application:
- 5-layer configuration hierarchy
- 476+ configuration accessors
- Variable interpolation
- Environment-specific overrides
- Schema validation
Code Example:
# Configuration drives behavior
provisioning server create --plan $(config.server.default_plan)
# Environment-specific configs
PROVISIONING_ENV=prod provisioning server create
Forbidden:
# ❌ WRONG - Hardcoded values
let server_plan = "medium"
# ✅ RIGHT - Configuration-driven
let server_plan = (config.server.plan)
Impact: Single codebase supports all environments.
4. Multi-Cloud Abstraction
Principle: Provider-agnostic interfaces enable multi-cloud deployments.
Why:
- Avoid vendor lock-in
- Reuse infrastructure code
- Support multiple cloud strategies
- Easy provider switching
Application:
- Unified provider interface
- Abstract resource definitions
- Provider-specific implementation
- Automatic provider selection
Code Example:
# Provider-agnostic configuration
{
servers = [
{
name = "web-01"
plan = "medium" # Abstract plan size
provider = "upcloud" # Swappable provider
}
]
}
Impact: Same Nickel schema deploys to UpCloud, AWS, or Hetzner.
5. Modular, Extensible Architecture
Principle: Components are loosely coupled, independently deployable.
Why:
- Easy to add features
- Support custom extensions
- Avoid monolithic growth
- Enable community contributions
- Flexible deployment options
Application:
- 54 core Nushell libraries
- 111+ CLI commands in 7 domains
- 50+ task services
- 5 cloud providers
- 9 cluster templates
- Pluggable provider interface
Impact: Add features without modifying core system.
6. Hybrid Rust + Nushell
Principle: Rust for performance-critical components, Nushell for orchestration.
Why:
- Rust: Type safety, zero-cost abstractions, performance
- Nushell: Structured data, productivity, easy automation
- Hybrid: Best of both worlds
Application:
- Core CLI: Bash wrapper → Nushell dispatcher
- Orchestrator: Rust scheduler + Nushell task execution
- Libraries: Nushell for business logic
- Performance: Rust plugins for 10-50x speedup
Impact: Fast, type-safe, productive infrastructure automation.
7. State Management via Graph Database
Principle: Infrastructure relationships tracked via SurrealDB graph.
Why:
- Model complex infrastructure relationships
- Query relationships efficiently
- Track dependencies
- Support rollback via state history
- Audit trail
Application:
- SurrealDB for relationship queries
- File-based persistence for queue
- Event-driven state updates
- Checkpoint-based recovery
Example Relationships:
Server → Network (connected to)
Server → Storage (mounts)
Cluster → Service (runs)
Workflow → Dependency (depends on)
Impact: Complex infrastructure relationships handled gracefully.
8. Security-First Design
Principle: Security is built-in, not bolted-on.
Why:
- Enterprise compliance
- Data protection
- Access control
- Audit trails
- Threat detection
Application:
- 4-layer security model (auth, authz, encryption, audit)
- JWT authentication
- Cedar policy enforcement
- AES-256-GCM encryption
- 7-year audit retention
- MFA support (TOTP, WebAuthn)
Impact: Enterprise-grade security by default.
9. Progressive Disclosure
Principle: Simple for common cases, powerful for advanced use cases.
Why:
- Low barrier to entry
- Professional productivity
- Advanced features available
- Avoid overwhelming users
- Gradual learning curve
Application:
- Simple: Interactive TUI installer
- Productive: CLI with 80+ shortcuts
- Powerful: Batch workflows, policies
- Advanced: Custom extensions, hooks
Impact: All skill levels supported.
10. Fail-Fast, Recover Gracefully
Principle: Detect issues early, provide recovery mechanisms.
Why:
- Prevent invalid deployments
- Enable safe recovery
- Minimize blast radius
- Audit failures for learning
Application:
- Validation before execution
- Checkpoint-based recovery
- Automatic rollback on failure
- Detailed error messages
- Retry with exponential backoff
Code Example:
# Validate before deployment
provisioning validate config --strict
# Dry-run to check impact
provisioning --check server create
# Safe rollback on failure
provisioning workflow rollback --to-checkpoint
Impact: Safe infrastructure changes with confidence.
11. Observable & Auditable
Principle: All operations traceable, all changes auditable.
Why:
- Compliance & regulation
- Troubleshooting
- Security investigation
- Team accountability
- Historical analysis
Application:
- Comprehensive audit logging
- 5 export formats (JSON, YAML, CSV, syslog, CloudWatch)
- Structured log entries
- Operation tracing
- Resource change tracking
Impact: Complete visibility into infrastructure changes.
12. No Shortcuts on Reliability
Principle: Reliability features are standard, not optional.
Why:
- Production requirements
- Minimize downtime
- Data protection
- Business continuity
- Trust & confidence
Application:
- Checkpoint recovery
- Automatic rollback
- Health monitoring
- Backup & restore
- Multi-node deployment
- Service redundancy
Impact: Enterprise-grade reliability standard.
Architectural Decision Records (ADRs)
Key decisions documenting rationale:
| ADR | Decision | Rationale | | --- | ---------| - --- | | ADR-011 | Nickel Migration | Type-safety over KCL flexibility | | ADR-010 | Config Strategy | 5-layer hierarchy over flat config | | ADR-009 | SurrealDB | Graph relationships over relational | | ADR-008 | Modular CLI | 80+ shortcuts over verbose commands | | ADR-007 | Workspace-First | Isolation over global state | | ADR-006 | Hybrid Architecture | Rust + Nushell for best of both |
Design Trade-offs
| Decision | Gain | Cost | | --- | -----| - --- | | Type-Safety | Fewer errors | Learning curve | | Config Hierarchy | Flexibility | Complexity | | Workspace Isolation | Safety | Duplication | | Modular CLI | Discoverability | No single command | | SurrealDB | Relationships | Resource overhead | | Validation Strict | Safety | Fast iteration friction |