Vapora/docs/adrs/README.md
Jesús Pérez 7110ffeea2
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
chore: extend doc: adr, tutorials, operations, etc
2026-01-12 03:32:47 +00:00

274 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# VAPORA Architecture Decision Records (ADRs)
Documentación de las decisiones arquitectónicas clave del proyecto VAPORA.
**Status**: Complete (27 ADRs documented)
**Last Updated**: January 12, 2026
**Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences)
---
## 📑 ADRs by Category
---
## 🗄️ Database & Persistence (1 ADR)
Decisiones sobre almacenamiento de datos y persistencia.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [004](./0004-surrealdb-database.md) | SurrealDB como Database Único | SurrealDB 2.3 multi-model (relational + graph + document) | ✅ Accepted |
---
## 🏗️ Core Architecture (6 ADRs)
Decisiones fundamentales sobre el stack tecnológico y estructura base del proyecto.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [001](./0001-cargo-workspace.md) | Cargo Workspace con 13 Crates | Monorepo con workspace Cargo | ✅ Accepted |
| [002](./0002-axum-backend.md) | Axum como Backend Framework | Axum 0.8.6 REST API + composable middleware | ✅ Accepted |
| [003](./0003-leptos-frontend.md) | Leptos CSR-Only Frontend | Leptos 0.8.12 WASM (Client-Side Rendering) | ✅ Accepted |
| [006](./0006-rig-framework.md) | Rig Framework para LLM Agents | rig-core 0.15 para orquestación de agentes | ✅ Accepted |
| [008](./0008-tokio-runtime.md) | Tokio Multi-Threaded Runtime | Tokio async runtime con configuración default | ✅ Accepted |
| [013](./0013-knowledge-graph.md) | Knowledge Graph Temporal | SurrealDB temporal KG + learning curves | ✅ Accepted |
---
## 🔄 Agent Coordination & Messaging (2 ADRs)
Decisiones sobre coordinación entre agentes y comunicación de mensajes.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [005](./0005-nats-jetstream.md) | NATS JetStream para Agent Coordination | async-nats 0.45 con JetStream (at-least-once delivery) | ✅ Accepted |
| [007](./0007-multi-provider-llm.md) | Multi-Provider LLM Support | Claude + OpenAI + Gemini + Ollama con fallback automático | ✅ Accepted |
---
## ☁️ Infrastructure & Security (4 ADRs)
Decisiones sobre infraestructura Kubernetes, seguridad, y gestión de secretos.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [009](./0009-istio-service-mesh.md) | Istio Service Mesh | Istio para mTLS + traffic management + observability | ✅ Accepted |
| [010](./0010-cedar-authorization.md) | Cedar Policy Engine | Cedar policies para RBAC declarativo | ✅ Accepted |
| [011](./0011-secretumvault.md) | SecretumVault Secrets Management | Post-quantum crypto para gestión de secretos | ✅ Accepted |
| [012](./0012-llm-routing-tiers.md) | Three-Tier LLM Routing | Rules-based + Dynamic + Manual Override | ✅ Accepted |
---
## 🚀 Innovaciones VAPORA (8 ADRs)
Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestación multi-agente.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [014](./0014-learning-profiles.md) | Learning Profiles con Recency Bias | Exponential recency weighting (3× para últimos 7 días) | ✅ Accepted |
| [015](./0015-budget-enforcement.md) | Three-Tier Budget Enforcement | Monthly + weekly limits con auto-fallback a Ollama | ✅ Accepted |
| [016](./0016-cost-efficiency-ranking.md) | Cost Efficiency Ranking | Formula: (quality_score * 100) / (cost_cents + 1) | ✅ Accepted |
| [017](./0017-confidence-weighting.md) | Confidence Weighting | min(1.0, executions/20) previene lucky streaks | ✅ Accepted |
| [018](./0018-swarm-load-balancing.md) | Swarm Load-Balanced Assignment | assignment_score = success_rate / (1 + load) | ✅ Accepted |
| [019](./0019-temporal-execution-history.md) | Temporal Execution History | Daily windowed aggregations para learning curves | ✅ Accepted |
| [020](./0020-audit-trail.md) | Audit Trail para Compliance | Complete event logging + queryability | ✅ Accepted |
| [021](./0021-websocket-updates.md) | Real-Time WebSocket Updates | tokio::sync::broadcast para pub/sub eficiente | ✅ Accepted |
---
## 🔧 Development Patterns (6 ADRs)
Patrones de desarrollo y arquitectura utilizados en todo el codebase.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [022](./0022-error-handling.md) | Two-Tier Error Handling | thiserror domain errors + ApiError HTTP wrapper | ✅ Accepted |
| [023](./0023-testing-strategy.md) | Multi-Layer Testing Strategy | Unit tests (inline) + Integration (tests/) + Real DB | ✅ Accepted |
| [024](./0024-service-architecture.md) | Service-Oriented Architecture | API layer (thin) + Services layer (thick business logic) | ✅ Accepted |
| [025](./0025-multi-tenancy.md) | SurrealDB Scope-Based Multi-Tenancy | tenant_id fields + database scopes para defense-in-depth | ✅ Accepted |
| [026](./0026-shared-state.md) | Arc-Based Shared State | Arc<RwLock<>> para read-heavy, Arc<Mutex<>> para write-heavy | ✅ Accepted |
| [027](./0027-documentation-layers.md) | Three-Layer Documentation System | .coder/ (session) + .claude/ (operational) + docs/ (product) | ✅ Accepted |
---
## Documentation by Category
### 🗄️ Database & Persistence
- **SurrealDB**: Multi-model database (relational + graph + document) unifies all VAPORA data needs with native multi-tenancy support via scopes
### 🏗️ Core Architecture
- **Workspace**: Monorepo structure with 13 specialized crates enables independent testing, parallel development, code reuse
- **Backend**: Axum provides composable middleware, type-safe routing, direct Tokio ecosystem integration
- **Frontend**: Leptos CSR enables fine-grained reactivity and WASM performance (no SEO needed for platform)
- **LLM Framework**: Rig enables tool calling and streaming with minimal abstraction
- **Runtime**: Tokio multi-threaded optimized for I/O-heavy workloads (API, DB, LLM calls)
- **Knowledge Graph**: Temporal history with learning curves enables collective agent learning via SurrealDB
### 🔄 Agent Coordination & Messaging
- **NATS JetStream**: Provides persistent, reliable at-least-once delivery for agent task coordination
- **Multi-Provider LLM**: Support 4 providers (Claude, OpenAI, Gemini, Ollama) with automatic fallback chain
### ☁️ Infrastructure & Security
- **Istio Service Mesh**: Provides zero-trust security (mTLS), traffic management, observability for inter-service communication
- **Cedar Authorization**: Declarative, auditable RBAC policies for fine-grained access control
- **SecretumVault**: Post-quantum cryptography future-proofs API key and credential storage
- **Three-Tier LLM Routing**: Balances predictability (rules-based) with flexibility (dynamic scoring) and manual override capability
### 🚀 Innovations Unique to VAPORA
- **Learning Profiles**: Recency-biased expertise tracking (3× weight for last 7 days) adapts agent selection to current capability
- **Budget Enforcement**: Dual time windows (monthly + weekly) with three enforcement states + auto-fallback prevent both long-term and short-term overspend
- **Cost Efficiency Ranking**: Quality-to-cost formula `(quality_score * 100) / (cost_cents + 1)` prevents overfitting to cheap providers
- **Confidence Weighting**: `min(1.0, executions/20)` prevents new agents from being selected on lucky streaks
- **Swarm Load Balancing**: `success_rate / (1 + load)` balances agent expertise with availability
- **Temporal Execution History**: Daily windowed aggregations identify improvement trends and enable collective learning
- **Audit Trail**: Complete event logging for compliance, incident investigation, and event sourcing potential
- **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates
### 🔧 Development Patterns
- **Two-Tier Error Handling**: Domain errors (`VaporaError`) separate from HTTP responses (`ApiError`) for reusability
- **Multi-Layer Testing**: Unit tests (inline) + Integration tests (tests/ dir) + Real database connections = 218+ tests
- **Service-Oriented Architecture**: Thin API layer delegates to thick services layer containing business logic
- **Scope-Based Multi-Tenancy**: `tenant_id` fields + SurrealDB scopes provide defense-in-depth tenant isolation
- **Arc-Based Shared State**: `Arc<RwLock<>>` for read-heavy, `Arc<Mutex<>>` for write-heavy state management
- **Three-Layer Documentation**: `.coder/` (session) + `.claude/` (operational) + `docs/` (product) separates concerns
---
## How to Use These ADRs
### For Team Members
1. **Understanding Architecture**: Start with Core Architecture ADRs (001-013) to understand technology choices
2. **Learning VAPORA's Unique Features**: Read Innovations ADRs (014-021) to understand what makes VAPORA different
3. **Writing New Code**: Reference relevant ADRs in Patterns section (022-027) when implementing features
### For New Hires
1. Read Core Architecture (001-013) first - ~30 minutes to understand the stack
2. Read Innovations (014-021) - ~45 minutes to understand VAPORA's differentiators
3. Reference Patterns (022-027) as you write your first contributions
### For Architectural Decisions
When making new architectural decisions:
1. Check existing ADRs to understand previous choices and trade-offs
2. Create a new ADR following the Custom VAPORA format
3. Reference existing ADRs that influenced your decision
4. Get team review before implementation
### For Troubleshooting
When debugging or optimizing:
1. Find the ADR for the relevant component
2. Review the "Implementation" section for key files
3. Check "Verification" for testing commands
4. Review "Consequences" for known limitations
---
## Format
Each ADR follows the Custom VAPORA format:
```markdown
# ADR-XXX: [Title]
**Status**: Accepted | Implemented
**Date**: YYYY-MM-DD
**Deciders**: [Team/Role]
**Technical Story**: [Context/Issue]
---
## Decision
[Descripción clara de la decisión]
## Rationale
[Por qué se tomó esta decisión]
## Alternatives Considered
[Opciones evaluadas y por qué se descartaron]
## Trade-offs
**Pros**: [Beneficios]
**Cons**: [Costos]
## Implementation
[Dónde está implementada, archivos clave, ejemplos de código]
## Verification
[Cómo verificar que la decisión está correctamente implementada]
## Consequences
[Impacto a largo plazo, dependencias, mantenimiento]
## References
[Links a docs, código, issues]
```
---
## Integration with Project Documentation
- **docs/operations/**: Deployment, disaster recovery, operational runbooks
- **docs/disaster-recovery/**: Backup strategy, recovery procedures, business continuity
- **.claude/guidelines/**: Development conventions (Rust, Nushell, Nickel)
- **.claude/CLAUDE.md**: Project-specific constraints and patterns
---
## Maintenance
### When to Update ADRs
- ❌ Do NOT create new ADRs for minor code changes
- ✅ DO create ADRs for significant architectural decisions (framework changes, new patterns, major refactoring)
- ✅ DO update ADRs if a decision changes (mark as "Superseded" and create new ADR)
### Review Process
- ADRs should be reviewed before major architectural changes
- Use ADRs as reference during code reviews to ensure consistency
- Update ADRs if they don't reflect current reality (source of truth = code)
### Quarterly Review
- Review all ADRs quarterly to ensure they're still accurate
- Update "Date" field if reviewed and still valid
- Mark as "Superseded" if implementation has changed
---
## Statistics
- **Total ADRs**: 27
- **Core Architecture**: 13 (48%)
- **Innovations**: 8 (30%)
- **Patterns**: 6 (22%)
- **Production Status**: All Accepted and Implemented
---
## Related Resources
- [VAPORA Architecture Overview](../README.md#architecture)
- [Development Guidelines](./../.claude/guidelines/rust.md)
- [Deployment Guide](./operations/deployment-runbook.md)
- [Disaster Recovery](./disaster-recovery/README.md)
---
**Generated**: January 12, 2026
**Status**: Production-Ready
**Last Reviewed**: January 12, 2026