Vapora/docs/adrs/README.md

# VAPORA Architecture Decision Records (ADRs)

Documentación de las decisiones arquitectónicas clave del proyecto VAPORA.

**Status**: Complete (32 ADRs documented)
**Last Updated**: 2026-02-17
**Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences)

---

## 📑 ADRs by Category

---

## 🗄️ Database & Persistence (1 ADR)

Decisiones sobre almacenamiento de datos y persistencia.

| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [004](./0004-surrealdb-database.md) | SurrealDB como Database Único | SurrealDB 2.3 multi-model (relational + graph + document) | ✅ Accepted |

---

## 🏗️ Core Architecture (6 ADRs)

Decisiones fundamentales sobre el stack tecnológico y estructura base del proyecto.

| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [001](./0001-cargo-workspace.md) | Cargo Workspace con 13 Crates | Monorepo con workspace Cargo | ✅ Accepted |
| [002](./0002-axum-backend.md) | Axum como Backend Framework | Axum 0.8.6 REST API + composable middleware | ✅ Accepted |
| [003](./0003-leptos-frontend.md) | Leptos CSR-Only Frontend | Leptos 0.8.12 WASM (Client-Side Rendering) | ✅ Accepted |
| [006](./0006-rig-framework.md) | Rig Framework para LLM Agents | rig-core 0.15 para orquestación de agentes | ✅ Accepted |
| [008](./0008-tokio-runtime.md) | Tokio Multi-Threaded Runtime | Tokio async runtime con configuración default | ✅ Accepted |
| [013](./0013-knowledge-graph.md) | Knowledge Graph Temporal | SurrealDB temporal KG + learning curves | ✅ Accepted |

---

## 🔄 Agent Coordination & Messaging (5 ADRs)

Decisiones sobre coordinación entre agentes y comunicación de mensajes.

| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [005](./0005-nats-jetstream.md) | NATS JetStream para Agent Coordination | async-nats 0.45 con JetStream (at-least-once delivery) | ✅ Accepted |
| [007](./0007-multi-provider-llm.md) | Multi-Provider LLM Support | Claude + OpenAI + Gemini + Ollama con fallback automático | ✅ Accepted |
| [030](./0030-a2a-protocol-implementation.md) | A2A Protocol Implementation | Axum JSON-RPC 2.0 server + resilient client con exponential backoff | ✅ Implemented |
| [031](./0031-kubernetes-deployment-kagent.md) | Kubernetes Deployment Strategy para kagent | Kustomize + StatefulSet con overlays dev/prod | ✅ Accepted |
| [032](./0032-a2a-error-handling-json-rpc.md) | A2A Error Handling y JSON-RPC 2.0 Compliance | Two-layer: thiserror domain errors + JSON-RPC 2.0 protocol conversion | ✅ Implemented |

---

## ☁️ Infrastructure & Security (4 ADRs)

Decisiones sobre infraestructura Kubernetes, seguridad, y gestión de secretos.

| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [009](./0009-istio-service-mesh.md) | Istio Service Mesh | Istio para mTLS + traffic management + observability | ✅ Accepted |
| [010](./0010-cedar-authorization.md) | Cedar Policy Engine | Cedar policies para RBAC declarativo | ✅ Accepted |
| [011](./0011-secretumvault.md) | SecretumVault Secrets Management | Post-quantum crypto para gestión de secretos | ✅ Accepted |
| [012](./0012-llm-routing-tiers.md) | Three-Tier LLM Routing | Rules-based + Dynamic + Manual Override | ✅ Accepted |

---

## 🚀 Innovaciones VAPORA (10 ADRs)

Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestación multi-agente.

| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [014](./0014-learning-profiles.md) | Learning Profiles con Recency Bias | Exponential recency weighting (3× para últimos 7 días) | ✅ Accepted |
| [015](./0015-budget-enforcement.md) | Three-Tier Budget Enforcement | Monthly + weekly limits con auto-fallback a Ollama | ✅ Accepted |
| [016](./0016-cost-efficiency-ranking.md) | Cost Efficiency Ranking | Formula: (quality_score * 100) / (cost_cents + 1) | ✅ Accepted |
| [017](./0017-confidence-weighting.md) | Confidence Weighting | min(1.0, executions/20) previene lucky streaks | ✅ Accepted |
| [018](./0018-swarm-load-balancing.md) | Swarm Load-Balanced Assignment | assignment_score = success_rate / (1 + load) | ✅ Accepted |
| [019](./0019-temporal-execution-history.md) | Temporal Execution History | Daily windowed aggregations para learning curves | ✅ Accepted |
| [020](./0020-audit-trail.md) | Audit Trail para Compliance | Complete event logging + queryability | ✅ Accepted |
| [021](./0021-websocket-updates.md) | Real-Time WebSocket Updates | tokio::sync::broadcast para pub/sub eficiente | ✅ Accepted |
| [028](./0028-workflow-orchestrator.md) | Workflow Orchestrator para Multi-Agent Pipelines | Short-lived agent contexts + artifact passing para reducir cache tokens 95% | ✅ Accepted |
| [029](./0029-rlm-recursive-language-models.md) | Recursive Language Models (RLM) | Custom Rust engine: BM25 + semantic hybrid search + distributed LLM dispatch + WASM/Docker sandbox | ✅ Accepted |

---

## 🔧 Development Patterns (6 ADRs)

Patrones de desarrollo y arquitectura utilizados en todo el codebase.

| ID | Título | Decisión | Status |
|----|---------| ---------|--------|
| [022](./0022-error-handling.md) | Two-Tier Error Handling | thiserror domain errors + ApiError HTTP wrapper | ✅ Accepted |
| [023](./0023-testing-strategy.md) | Multi-Layer Testing Strategy | Unit tests (inline) + Integration (tests/) + Real DB | ✅ Accepted |
| [024](./0024-service-architecture.md) | Service-Oriented Architecture | API layer (thin) + Services layer (thick business logic) | ✅ Accepted |
| [025](./0025-multi-tenancy.md) | SurrealDB Scope-Based Multi-Tenancy | tenant_id fields + database scopes para defense-in-depth | ✅ Accepted |
| [026](./0026-shared-state.md) | Arc-Based Shared State | Arc<RwLock<>> para read-heavy, Arc<Mutex<>> para write-heavy | ✅ Accepted |
| [027](./0027-documentation-layers.md) | Three-Layer Documentation System | .coder/ (session) + .claude/ (operational) + docs/ (product) | ✅ Accepted |

---

## Documentation by Category

### 🗄️ Database & Persistence

- **SurrealDB**: Multi-model database (relational + graph + document) unifies all VAPORA data needs with native multi-tenancy support via scopes

### 🏗️ Core Architecture

- **Workspace**: Monorepo structure with 13 specialized crates enables independent testing, parallel development, code reuse
- **Backend**: Axum provides composable middleware, type-safe routing, direct Tokio ecosystem integration
- **Frontend**: Leptos CSR enables fine-grained reactivity and WASM performance (no SEO needed for platform)
- **LLM Framework**: Rig enables tool calling and streaming with minimal abstraction
- **Runtime**: Tokio multi-threaded optimized for I/O-heavy workloads (API, DB, LLM calls)
- **Knowledge Graph**: Temporal history with learning curves enables collective agent learning via SurrealDB

### 🔄 Agent Coordination & Messaging

- **NATS JetStream**: Provides persistent, reliable at-least-once delivery for agent task coordination
- **Multi-Provider LLM**: Support 4 providers (Claude, OpenAI, Gemini, Ollama) with automatic fallback chain
- **A2A Protocol**: JSON-RPC 2.0 over HTTP enables interoperability with Google kagent and other A2A-compliant agents
- **kagent Kubernetes Deployment**: Kustomize StatefulSet with stable pod identities for predictable A2A endpoint addressing
- **A2A Error Handling**: Two-layer strategy (domain `thiserror` + JSON-RPC 2.0 protocol conversion) specializes ADR-0022 for A2A

### ☁️ Infrastructure & Security

- **Istio Service Mesh**: Provides zero-trust security (mTLS), traffic management, observability for inter-service communication
- **Cedar Authorization**: Declarative, auditable RBAC policies for fine-grained access control
- **SecretumVault**: Post-quantum cryptography future-proofs API key and credential storage
- **Three-Tier LLM Routing**: Balances predictability (rules-based) with flexibility (dynamic scoring) and manual override capability

### 🚀 Innovations Unique to VAPORA

- **Learning Profiles**: Recency-biased expertise tracking (3× weight for last 7 days) adapts agent selection to current capability
- **Budget Enforcement**: Dual time windows (monthly + weekly) with three enforcement states + auto-fallback prevent both long-term and short-term overspend
- **Cost Efficiency Ranking**: Quality-to-cost formula `(quality_score * 100) / (cost_cents + 1)` prevents overfitting to cheap providers
- **Confidence Weighting**: `min(1.0, executions/20)` prevents new agents from being selected on lucky streaks
- **Swarm Load Balancing**: `success_rate / (1 + load)` balances agent expertise with availability
- **Temporal Execution History**: Daily windowed aggregations identify improvement trends and enable collective learning
- **Audit Trail**: Complete event logging for compliance, incident investigation, and event sourcing potential
- **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates
- **Workflow Orchestrator**: Short-lived agent contexts + artifact passing reduce cache token costs ~95% vs monolithic sessions
- **Recursive Language Models (RLM)**: Hybrid BM25+semantic search + distributed LLM dispatch + WASM/Docker sandbox enables reasoning over 100k+ token documents

### 🔧 Development Patterns

- **Two-Tier Error Handling**: Domain errors (`VaporaError`) separate from HTTP responses (`ApiError`) for reusability
- **Multi-Layer Testing**: Unit tests (inline) + Integration tests (tests/ dir) + Real database connections = 218+ tests
- **Service-Oriented Architecture**: Thin API layer delegates to thick services layer containing business logic
- **Scope-Based Multi-Tenancy**: `tenant_id` fields + SurrealDB scopes provide defense-in-depth tenant isolation
- **Arc-Based Shared State**: `Arc<RwLock<>>` for read-heavy, `Arc<Mutex<>>` for write-heavy state management
- **Three-Layer Documentation**: `.coder/` (session) + `.claude/` (operational) + `docs/` (product) separates concerns

---

## How to Use These ADRs

### For Team Members

1. **Understanding Architecture**: Start with Core Architecture ADRs (001-013) to understand technology choices
2. **Learning VAPORA's Unique Features**: Read Innovations ADRs (014-021) to understand what makes VAPORA different
3. **Writing New Code**: Reference relevant ADRs in Patterns section (022-027) when implementing features

### For New Hires

1. Read Core Architecture (001-013) first - ~30 minutes to understand the stack
2. Read Innovations (014-021) - ~45 minutes to understand VAPORA's differentiators
3. Reference Patterns (022-027) as you write your first contributions

### For Architectural Decisions

When making new architectural decisions:

1. Check existing ADRs to understand previous choices and trade-offs
2. Create a new ADR following the Custom VAPORA format
3. Reference existing ADRs that influenced your decision
4. Get team review before implementation

### For Troubleshooting

When debugging or optimizing:

1. Find the ADR for the relevant component
2. Review the "Implementation" section for key files
3. Check "Verification" for testing commands
4. Review "Consequences" for known limitations

---

## Format

Each ADR follows the Custom VAPORA format:

```markdown
# ADR-XXX: [Title]

**Status**: Accepted | Implemented
**Date**: YYYY-MM-DD
**Deciders**: [Team/Role]
**Technical Story**: [Context/Issue]

---

## Decision
[Descripción clara de la decisión]

## Rationale
[Por qué se tomó esta decisión]

## Alternatives Considered
[Opciones evaluadas y por qué se descartaron]

## Trade-offs
**Pros**: [Beneficios]
**Cons**: [Costos]

## Implementation
[Dónde está implementada, archivos clave, ejemplos de código]

## Verification
[Cómo verificar que la decisión está correctamente implementada]

## Consequences
[Impacto a largo plazo, dependencias, mantenimiento]

## References
[Links a docs, código, issues]
```

---

## Integration with Project Documentation

- **docs/operations/**: Deployment, disaster recovery, operational runbooks
- **docs/disaster-recovery/**: Backup strategy, recovery procedures, business continuity
- **.claude/guidelines/**: Development conventions (Rust, Nushell, Nickel)
- **.claude/CLAUDE.md**: Project-specific constraints and patterns

---

## Maintenance

### When to Update ADRs

- ❌ Do NOT create new ADRs for minor code changes
- ✅ DO create ADRs for significant architectural decisions (framework changes, new patterns, major refactoring)
- ✅ DO update ADRs if a decision changes (mark as "Superseded" and create new ADR)

### Review Process

- ADRs should be reviewed before major architectural changes
- Use ADRs as reference during code reviews to ensure consistency
- Update ADRs if they don't reflect current reality (source of truth = code)

### Quarterly Review

- Review all ADRs quarterly to ensure they're still accurate
- Update "Date" field if reviewed and still valid
- Mark as "Superseded" if implementation has changed

---

## Statistics

- **Total ADRs**: 32
- **Core Architecture**: 13 (41%)
- **Agent Coordination**: 5 (16%)
- **Infrastructure**: 4 (12%)
- **Innovations**: 10 (31%)
- **Patterns**: 6 (19%)
- **Production Status**: All Accepted and Implemented

---

## Related Resources

- [VAPORA Architecture Overview](../README.md#architecture)
- [Development Guidelines](./../.claude/guidelines/rust.md)
- [Deployment Guide](./operations/deployment-runbook.md)
- [Disaster Recovery](./disaster-recovery/README.md)

---

**Generated**: January 12, 2026
**Status**: Production-Ready
**Last Reviewed**: 2026-02-17