Vapora/docs/adrs/README.md

274 lines
12 KiB
Markdown
Raw Normal View History

# VAPORA Architecture Decision Records (ADRs)
Documentación de las decisiones arquitectónicas clave del proyecto VAPORA.
**Status**: Complete (27 ADRs documented)
**Last Updated**: January 12, 2026
**Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences)
---
## 📑 ADRs by Category
---
## 🗄️ Database & Persistence (1 ADR)
Decisiones sobre almacenamiento de datos y persistencia.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [004](./0004-surrealdb-database.md) | SurrealDB como Database Único | SurrealDB 2.3 multi-model (relational + graph + document) | ✅ Accepted |
---
## 🏗️ Core Architecture (6 ADRs)
Decisiones fundamentales sobre el stack tecnológico y estructura base del proyecto.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [001](./0001-cargo-workspace.md) | Cargo Workspace con 13 Crates | Monorepo con workspace Cargo | ✅ Accepted |
| [002](./0002-axum-backend.md) | Axum como Backend Framework | Axum 0.8.6 REST API + composable middleware | ✅ Accepted |
| [003](./0003-leptos-frontend.md) | Leptos CSR-Only Frontend | Leptos 0.8.12 WASM (Client-Side Rendering) | ✅ Accepted |
| [006](./0006-rig-framework.md) | Rig Framework para LLM Agents | rig-core 0.15 para orquestación de agentes | ✅ Accepted |
| [008](./0008-tokio-runtime.md) | Tokio Multi-Threaded Runtime | Tokio async runtime con configuración default | ✅ Accepted |
| [013](./0013-knowledge-graph.md) | Knowledge Graph Temporal | SurrealDB temporal KG + learning curves | ✅ Accepted |
---
## 🔄 Agent Coordination & Messaging (2 ADRs)
Decisiones sobre coordinación entre agentes y comunicación de mensajes.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [005](./0005-nats-jetstream.md) | NATS JetStream para Agent Coordination | async-nats 0.45 con JetStream (at-least-once delivery) | ✅ Accepted |
| [007](./0007-multi-provider-llm.md) | Multi-Provider LLM Support | Claude + OpenAI + Gemini + Ollama con fallback automático | ✅ Accepted |
---
## ☁️ Infrastructure & Security (4 ADRs)
Decisiones sobre infraestructura Kubernetes, seguridad, y gestión de secretos.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [009](./0009-istio-service-mesh.md) | Istio Service Mesh | Istio para mTLS + traffic management + observability | ✅ Accepted |
| [010](./0010-cedar-authorization.md) | Cedar Policy Engine | Cedar policies para RBAC declarativo | ✅ Accepted |
| [011](./0011-secretumvault.md) | SecretumVault Secrets Management | Post-quantum crypto para gestión de secretos | ✅ Accepted |
| [012](./0012-llm-routing-tiers.md) | Three-Tier LLM Routing | Rules-based + Dynamic + Manual Override | ✅ Accepted |
---
## 🚀 Innovaciones VAPORA (8 ADRs)
Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestación multi-agente.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [014](./0014-learning-profiles.md) | Learning Profiles con Recency Bias | Exponential recency weighting (3× para últimos 7 días) | ✅ Accepted |
| [015](./0015-budget-enforcement.md) | Three-Tier Budget Enforcement | Monthly + weekly limits con auto-fallback a Ollama | ✅ Accepted |
| [016](./0016-cost-efficiency-ranking.md) | Cost Efficiency Ranking | Formula: (quality_score * 100) / (cost_cents + 1) | ✅ Accepted |
| [017](./0017-confidence-weighting.md) | Confidence Weighting | min(1.0, executions/20) previene lucky streaks | ✅ Accepted |
| [018](./0018-swarm-load-balancing.md) | Swarm Load-Balanced Assignment | assignment_score = success_rate / (1 + load) | ✅ Accepted |
| [019](./0019-temporal-execution-history.md) | Temporal Execution History | Daily windowed aggregations para learning curves | ✅ Accepted |
| [020](./0020-audit-trail.md) | Audit Trail para Compliance | Complete event logging + queryability | ✅ Accepted |
| [021](./0021-websocket-updates.md) | Real-Time WebSocket Updates | tokio::sync::broadcast para pub/sub eficiente | ✅ Accepted |
---
## 🔧 Development Patterns (6 ADRs)
Patrones de desarrollo y arquitectura utilizados en todo el codebase.
| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [022](./0022-error-handling.md) | Two-Tier Error Handling | thiserror domain errors + ApiError HTTP wrapper | ✅ Accepted |
| [023](./0023-testing-strategy.md) | Multi-Layer Testing Strategy | Unit tests (inline) + Integration (tests/) + Real DB | ✅ Accepted |
| [024](./0024-service-architecture.md) | Service-Oriented Architecture | API layer (thin) + Services layer (thick business logic) | ✅ Accepted |
| [025](./0025-multi-tenancy.md) | SurrealDB Scope-Based Multi-Tenancy | tenant_id fields + database scopes para defense-in-depth | ✅ Accepted |
| [026](./0026-shared-state.md) | Arc-Based Shared State | Arc<RwLock<>> para read-heavy, Arc<Mutex<>> para write-heavy | ✅ Accepted |
| [027](./0027-documentation-layers.md) | Three-Layer Documentation System | .coder/ (session) + .claude/ (operational) + docs/ (product) | ✅ Accepted |
---
## Documentation by Category
### 🗄️ Database & Persistence
- **SurrealDB**: Multi-model database (relational + graph + document) unifies all VAPORA data needs with native multi-tenancy support via scopes
### 🏗️ Core Architecture
- **Workspace**: Monorepo structure with 13 specialized crates enables independent testing, parallel development, code reuse
- **Backend**: Axum provides composable middleware, type-safe routing, direct Tokio ecosystem integration
- **Frontend**: Leptos CSR enables fine-grained reactivity and WASM performance (no SEO needed for platform)
- **LLM Framework**: Rig enables tool calling and streaming with minimal abstraction
- **Runtime**: Tokio multi-threaded optimized for I/O-heavy workloads (API, DB, LLM calls)
- **Knowledge Graph**: Temporal history with learning curves enables collective agent learning via SurrealDB
### 🔄 Agent Coordination & Messaging
- **NATS JetStream**: Provides persistent, reliable at-least-once delivery for agent task coordination
- **Multi-Provider LLM**: Support 4 providers (Claude, OpenAI, Gemini, Ollama) with automatic fallback chain
### ☁️ Infrastructure & Security
- **Istio Service Mesh**: Provides zero-trust security (mTLS), traffic management, observability for inter-service communication
- **Cedar Authorization**: Declarative, auditable RBAC policies for fine-grained access control
- **SecretumVault**: Post-quantum cryptography future-proofs API key and credential storage
- **Three-Tier LLM Routing**: Balances predictability (rules-based) with flexibility (dynamic scoring) and manual override capability
### 🚀 Innovations Unique to VAPORA
- **Learning Profiles**: Recency-biased expertise tracking (3× weight for last 7 days) adapts agent selection to current capability
- **Budget Enforcement**: Dual time windows (monthly + weekly) with three enforcement states + auto-fallback prevent both long-term and short-term overspend
- **Cost Efficiency Ranking**: Quality-to-cost formula `(quality_score * 100) / (cost_cents + 1)` prevents overfitting to cheap providers
- **Confidence Weighting**: `min(1.0, executions/20)` prevents new agents from being selected on lucky streaks
- **Swarm Load Balancing**: `success_rate / (1 + load)` balances agent expertise with availability
- **Temporal Execution History**: Daily windowed aggregations identify improvement trends and enable collective learning
- **Audit Trail**: Complete event logging for compliance, incident investigation, and event sourcing potential
- **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates
### 🔧 Development Patterns
- **Two-Tier Error Handling**: Domain errors (`VaporaError`) separate from HTTP responses (`ApiError`) for reusability
- **Multi-Layer Testing**: Unit tests (inline) + Integration tests (tests/ dir) + Real database connections = 218+ tests
- **Service-Oriented Architecture**: Thin API layer delegates to thick services layer containing business logic
- **Scope-Based Multi-Tenancy**: `tenant_id` fields + SurrealDB scopes provide defense-in-depth tenant isolation
- **Arc-Based Shared State**: `Arc<RwLock<>>` for read-heavy, `Arc<Mutex<>>` for write-heavy state management
- **Three-Layer Documentation**: `.coder/` (session) + `.claude/` (operational) + `docs/` (product) separates concerns
---
## How to Use These ADRs
### For Team Members
1. **Understanding Architecture**: Start with Core Architecture ADRs (001-013) to understand technology choices
2. **Learning VAPORA's Unique Features**: Read Innovations ADRs (014-021) to understand what makes VAPORA different
3. **Writing New Code**: Reference relevant ADRs in Patterns section (022-027) when implementing features
### For New Hires
1. Read Core Architecture (001-013) first - ~30 minutes to understand the stack
2. Read Innovations (014-021) - ~45 minutes to understand VAPORA's differentiators
3. Reference Patterns (022-027) as you write your first contributions
### For Architectural Decisions
When making new architectural decisions:
1. Check existing ADRs to understand previous choices and trade-offs
2. Create a new ADR following the Custom VAPORA format
3. Reference existing ADRs that influenced your decision
4. Get team review before implementation
### For Troubleshooting
When debugging or optimizing:
1. Find the ADR for the relevant component
2. Review the "Implementation" section for key files
3. Check "Verification" for testing commands
4. Review "Consequences" for known limitations
---
## Format
Each ADR follows the Custom VAPORA format:
```markdown
# ADR-XXX: [Title]
**Status**: Accepted | Implemented
**Date**: YYYY-MM-DD
**Deciders**: [Team/Role]
**Technical Story**: [Context/Issue]
---
## Decision
[Descripción clara de la decisión]
## Rationale
[Por qué se tomó esta decisión]
## Alternatives Considered
[Opciones evaluadas y por qué se descartaron]
## Trade-offs
**Pros**: [Beneficios]
**Cons**: [Costos]
## Implementation
[Dónde está implementada, archivos clave, ejemplos de código]
## Verification
[Cómo verificar que la decisión está correctamente implementada]
## Consequences
[Impacto a largo plazo, dependencias, mantenimiento]
## References
[Links a docs, código, issues]
```
---
## Integration with Project Documentation
- **docs/operations/**: Deployment, disaster recovery, operational runbooks
- **docs/disaster-recovery/**: Backup strategy, recovery procedures, business continuity
- **.claude/guidelines/**: Development conventions (Rust, Nushell, Nickel)
- **.claude/CLAUDE.md**: Project-specific constraints and patterns
---
## Maintenance
### When to Update ADRs
- ❌ Do NOT create new ADRs for minor code changes
- ✅ DO create ADRs for significant architectural decisions (framework changes, new patterns, major refactoring)
- ✅ DO update ADRs if a decision changes (mark as "Superseded" and create new ADR)
### Review Process
- ADRs should be reviewed before major architectural changes
- Use ADRs as reference during code reviews to ensure consistency
- Update ADRs if they don't reflect current reality (source of truth = code)
### Quarterly Review
- Review all ADRs quarterly to ensure they're still accurate
- Update "Date" field if reviewed and still valid
- Mark as "Superseded" if implementation has changed
---
## Statistics
- **Total ADRs**: 27
- **Core Architecture**: 13 (48%)
- **Innovations**: 8 (30%)
- **Patterns**: 6 (22%)
- **Production Status**: All Accepted and Implemented
---
## Related Resources
- [VAPORA Architecture Overview](../README.md#architecture)
- [Development Guidelines](./../.claude/guidelines/rust.md)
- [Deployment Guide](./operations/deployment-runbook.md)
- [Disaster Recovery](./disaster-recovery/README.md)
---
**Generated**: January 12, 2026
**Status**: Production-Ready
**Last Reviewed**: January 12, 2026