# ADR-0028: Workflow Orchestrator for Cost-Efficient Multi-Agent Pipelines ## Status **Accepted** - Implemented in v1.2.0 ## Context ### The Problem: Excessive LLM Costs from Cache Token Accumulation Analysis of real Claude Code usage data (5 weeks, individual developer) reveals a critical cost pattern: | Metric | Value | |--------|-------| | Total cost | $1,050.68 | | Weekly average | ~$210 | | Monthly projection | ~$840 | | Cache read tokens | 3.82B (95.7% of total) | | Cache creation tokens | 170M (4.3%) | | Direct input tokens | 2.4M (0.06%) | | Direct output tokens | 366K (0.009%) | **The cost is dominated by cache tokens, not generation.** ### Root Cause: Monolithic Session Pattern Current workflow with Claude Code follows a monolithic session pattern: ```text Session start ├─ Message 1: context 50K → cache read 50K ├─ Message 2: context 100K → cache read 100K ├─ Message 3: context 150K → cache read 150K ├─ ... └─ Message 50: context 800K → cache read 800K ───────────────── ~20M cache reads per session ``` Each message in a long session re-sends the entire conversation history. Over a typical development session (50+ messages), context accumulates to 500K-1M tokens, with each subsequent message re-transmitting all previous context. ### Why This Matters At current pricing (2026 rates): - Cache read (Haiku): $0.03/1M tokens - Cache read (Sonnet): $0.30/1M tokens - Cache read (Opus): $1.50/1M tokens With 3.82B cache read tokens distributed across Sonnet (51%) and Haiku (38%), the cache cost alone exceeds what direct input/output would cost. ## Decision **Implement a Workflow Orchestrator (vapora-workflow-engine) that executes multi-stage pipelines with short-lived agent contexts.** ### Architecture: Agents with Short Lifecycles Instead of one long session accumulating context, workflows execute as discrete stages: ```text ┌─────────────────────────────────────────────────────────┐ │ Task: "Implement feature X" │ └─────────────────────────────────────────────────────────┘ │ ┌────────────────────┼────────────────────┐ ▼ ▼ ▼ ┌─────────┐ ┌──────────┐ ┌──────────┐ │Architect│ │Developer │ │ Reviewer │ │ (Opus) │ │ (Haiku) │ │ (Sonnet) │ ├─────────┤ ├──────────┤ ├──────────┤ │Context: │ │Context: │ │Context: │ │ 40K │───────▶│ 25K │───────▶│ 35K │ │ 5 msgs │ spec │ 12 msgs │ code │ 4 msgs │ │ 200K │ │ 300K │ │ 140K │ │ cache │ │ cache │ │ cache │ └────┬────┘ └────┬─────┘ └────┬─────┘ │ │ │ ▼ ▼ ▼ TERMINATES TERMINATES TERMINATES (context (context (context discarded) discarded) discarded) Total cache: ~640K Monolithic equivalent: ~20-40M Reduction: 95-97% ``` ### Key Principles 1. **Context isolation**: Each agent receives only what it needs (spec, relevant files), not full conversation history 2. **Artifact passing, not conversation passing**: Between agents flows the result (spec, code, review), not the dialogue that produced it 3. **Short lifecycles**: Agent completes task → context dies → next agent starts fresh 4. **Persistent memory via Kogral**: Important decisions/patterns stored in knowledge base, not in session context ## Implementation ### Components 1. **vapora-workflow-engine** (new crate): - `WorkflowOrchestrator`: Main coordinator managing workflow lifecycle - `WorkflowInstance`: State machine tracking individual workflow execution - `StageState`: Manages stage execution and task assignment - `Artifact`: Data passed between stages (ADR, Code, TestResults, Review, Documentation) 2. **Workflow Templates** (`config/workflows.toml`): - `feature_development` (5 stages): architecture → implementation → testing → review → deployment - `bugfix` (4 stages): investigation → fix → testing → deployment - `documentation_update` (3 stages): content → review → publish - `security_audit` (4 stages): analysis → pentesting → remediation → verification 3. **REST API** (`/api/v1/workflow_orchestrator`): - `POST /` - Start workflow - `GET /` - List active workflows - `GET /:id` - Get workflow status - `POST /:id/approve` - Approve waiting stage - `POST /:id/cancel` - Cancel running workflow - `GET /templates` - List available templates 4. **CLI** (vapora-cli): - `vapora workflow start --template --context context.json` - `vapora workflow list` - `vapora workflow status ` - `vapora workflow approve --approver "Name"` - `vapora workflow cancel --reason "Reason"` - `vapora workflow templates` 5. **Kogral Integration**: - `enrich_context_from_kogral()` - Loads guidelines, patterns, ADRs - Filesystem-based knowledge retrieval from `.kogral/` directory - Configurable via `KOGRAL_PATH` environment variable ### Integration with Existing Components | Component | Usage | |-----------|-------| | SwarmCoordinator | Task assignment via `submit_task_for_bidding()` | | AgentRegistry | 12 roles with lifecycle management | | LearningProfiles | Expertise-based agent selection | | KGPersistence | Workflow execution history | | NATS JetStream | Inter-stage event coordination | ## Rationale ### Why Vapora Already Has the Pieces Current Vapora implementation includes: | Component | Status | Functionality | |-----------|--------|---------------| | SwarmCoordinator | Complete | Task assignment, load balancing | | AgentRegistry | Complete | 12 roles, lifecycle management | | Learning Profiles | Complete | Expertise scoring with recency bias | | KG Persistence | Complete | SurrealDB, execution history | | NATS Messaging | Complete | Inter-agent communication | | Workflow Templates | Complete | `workflows.toml` with stage definitions | | Artifact Types | Complete | `TaskCompleted.artifacts` field | **What was missing**: The orchestration layer that executes workflow templates by loading templates, creating instances, listening for task completions, advancing stages, and passing artifacts. ### Why Not Alternative Solutions | Alternative | Why Not | |-------------|---------| | Manual `/compact` in Claude Code | Requires user discipline, doesn't fundamentally change pattern | | Shorter sessions manually | Loses context continuity, user must track state | | External tools (LiteLLM, CrewAI) | Python-based, doesn't leverage existing Vapora infrastructure | | Just use Haiku everywhere | Quality degradation for complex tasks | Vapora already has budget-aware routing, learning profiles, and swarm coordination. The workflow orchestrator completes the picture. ### Why Kogral Integration Kogral provides persistent knowledge that would otherwise bloat session context: | Without Kogral | With Kogral | |----------------|-------------| | Guidelines re-explained each session | Query once via MCP, inject 5K tokens | | ADRs repeated in conversation | Reference by ID, inject summary | | Patterns described verbally | Structured retrieval, minimal tokens | Kogral transforms "remember our auth pattern" (requires context) into "query pattern:auth" (stateless lookup). ## Consequences ### Positive 1. **~95% reduction in cache token costs**: $840/month → ~$50-100/month for same workload 2. **Better model allocation**: Opus for architecture (high quality, few tokens), Haiku for implementation (lower quality acceptable, many tokens) 3. **Leverages existing investment**: Uses SwarmCoordinator, LearningProfiles, KGPersistence already built 4. **Audit trail**: Each agent execution persisted to KG with tokens, cost, duration 5. **Parallelization**: Multiple developers can work simultaneously on different parts 6. **Quality through specialization**: Each agent optimized for its role vs one generalist session ### Negative 1. **Orchestration overhead**: Additional component to maintain 2. **Latency between stages**: Artifact passing adds delay vs continuous conversation 3. **Context loss between agents**: Agent B doesn't know what Agent A "considered but rejected" 4. **Debugging complexity**: Issues span multiple agent executions ### Mitigations | Negative | Mitigation | |----------|------------| | Orchestration overhead | Minimal code (~1500 lines), clear separation of concerns | | Latency | Parallel stages where possible, async execution | | Context loss | Kogral captures decisions, not just outcomes | | Debugging | Workflow ID traces all related executions in KG | ## Metrics for Success | Metric | Before | After (Target) | |--------|--------|----------------| | Monthly LLM cost | ~$840 | <$150 | | Cache tokens per task | ~20M | <1M | | Average context size | 500K+ | <50K per agent | | Workflow completion rate | N/A | >95% | ## Cost Projection Based on analyzed usage patterns with optimized workflow: | Role | Model | % of Work | Monthly Cost | |------|-------|-----------|--------------| | Architect | Opus | 10% | ~$25 | | Developer | Haiku | 50% | ~$30 | | Reviewer | Sonnet | 25% | ~$40 | | Tester | Haiku | 15% | ~$15 | | **Total** | | | **~$110** | **Savings: ~$730/month (87% reduction)** ## Implementation Status - **Status**: Complete (v1.2.0) - **Crates**: vapora-workflow-engine, vapora-cli - **Tests**: 26 unit tests + 1 doc test passing - **Endpoints**: 6 REST API endpoints - **Templates**: 4 pre-configured workflows - **CLI Commands**: 6 workflow management commands ## References - Usage data: Claude Code usage analysis (5 weeks, 3.82B cache tokens) - Vapora SwarmCoordinator: `crates/vapora-swarm/src/coordinator.rs` - Vapora Workflows Config: `config/workflows.toml` - Kogral MCP: `kogral-mcp` (external project) - Implementation: `crates/vapora-workflow-engine/` - CLI: `crates/vapora-cli/` ## Related ADRs - ADR-0014: Learning-Based Agent Selection - ADR-0015: Budget Enforcement & Cost Optimization - ADR-0013: Knowledge Graph for Temporal Execution History - ADR-0018: Swarm Load Balancing ## Decision Drivers 1. **Data-driven**: 95% of cost is cache tokens from long sessions 2. **Infrastructure exists**: Vapora has all pieces except orchestrator 3. **Kogral synergy**: Persistent knowledge reduces context requirements 4. **Measurable outcome**: Clear before/after metrics for validation 5. **Production-ready**: Complete implementation with tests and documentation