Vapora/docs/adrs/0028-workflow-orchestrator.md
Jesús Pérez cc55b97678
Some checks failed
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Nickel Type Check / Nickel Type Checking (push) Has been cancelled
chore: update README and CHANGELOG with workflow orchestrator features
2026-01-24 02:07:45 +00:00

11 KiB

ADR-0028: Workflow Orchestrator for Cost-Efficient Multi-Agent Pipelines

Status

Accepted - Implemented in v1.2.0

Context

The Problem: Excessive LLM Costs from Cache Token Accumulation

Analysis of real Claude Code usage data (5 weeks, individual developer) reveals a critical cost pattern:

Metric Value
Total cost $1,050.68
Weekly average ~$210
Monthly projection ~$840
Cache read tokens 3.82B (95.7% of total)
Cache creation tokens 170M (4.3%)
Direct input tokens 2.4M (0.06%)
Direct output tokens 366K (0.009%)

The cost is dominated by cache tokens, not generation.

Root Cause: Monolithic Session Pattern

Current workflow with Claude Code follows a monolithic session pattern:

Session start
├─ Message 1:  context 50K   → cache read 50K
├─ Message 2:  context 100K  → cache read 100K
├─ Message 3:  context 150K  → cache read 150K
├─ ...
└─ Message 50: context 800K  → cache read 800K
                             ─────────────────
                             ~20M cache reads per session

Each message in a long session re-sends the entire conversation history. Over a typical development session (50+ messages), context accumulates to 500K-1M tokens, with each subsequent message re-transmitting all previous context.

Why This Matters

At current pricing (2026 rates):

  • Cache read (Haiku): $0.03/1M tokens
  • Cache read (Sonnet): $0.30/1M tokens
  • Cache read (Opus): $1.50/1M tokens

With 3.82B cache read tokens distributed across Sonnet (51%) and Haiku (38%), the cache cost alone exceeds what direct input/output would cost.

Decision

Implement a Workflow Orchestrator (vapora-workflow-engine) that executes multi-stage pipelines with short-lived agent contexts.

Architecture: Agents with Short Lifecycles

Instead of one long session accumulating context, workflows execute as discrete stages:

┌─────────────────────────────────────────────────────────┐
│ Task: "Implement feature X"                              │
└─────────────────────────────────────────────────────────┘
                         │
    ┌────────────────────┼────────────────────┐
    ▼                    ▼                    ▼
┌─────────┐        ┌──────────┐        ┌──────────┐
│Architect│        │Developer │        │ Reviewer │
│ (Opus)  │        │ (Haiku)  │        │ (Sonnet) │
├─────────┤        ├──────────┤        ├──────────┤
│Context: │        │Context:  │        │Context:  │
│ 40K     │───────▶│ 25K      │───────▶│ 35K      │
│ 5 msgs  │ spec   │ 12 msgs  │ code   │ 4 msgs   │
│ 200K    │        │ 300K     │        │ 140K     │
│ cache   │        │ cache    │        │ cache    │
└────┬────┘        └────┬─────┘        └────┬─────┘
     │                  │                   │
     ▼                  ▼                   ▼
  TERMINATES         TERMINATES          TERMINATES
  (context           (context            (context
   discarded)         discarded)          discarded)

Total cache: ~640K
Monolithic equivalent: ~20-40M
Reduction: 95-97%

Key Principles

  1. Context isolation: Each agent receives only what it needs (spec, relevant files), not full conversation history

  2. Artifact passing, not conversation passing: Between agents flows the result (spec, code, review), not the dialogue that produced it

  3. Short lifecycles: Agent completes task → context dies → next agent starts fresh

  4. Persistent memory via Kogral: Important decisions/patterns stored in knowledge base, not in session context

Implementation

Components

  1. vapora-workflow-engine (new crate):

    • WorkflowOrchestrator: Main coordinator managing workflow lifecycle
    • WorkflowInstance: State machine tracking individual workflow execution
    • StageState: Manages stage execution and task assignment
    • Artifact: Data passed between stages (ADR, Code, TestResults, Review, Documentation)
  2. Workflow Templates (config/workflows.toml):

    • feature_development (5 stages): architecture → implementation → testing → review → deployment
    • bugfix (4 stages): investigation → fix → testing → deployment
    • documentation_update (3 stages): content → review → publish
    • security_audit (4 stages): analysis → pentesting → remediation → verification
  3. REST API (/api/v1/workflow_orchestrator):

    • POST / - Start workflow
    • GET / - List active workflows
    • GET /:id - Get workflow status
    • POST /:id/approve - Approve waiting stage
    • POST /:id/cancel - Cancel running workflow
    • GET /templates - List available templates
  4. CLI (vapora-cli):

    • vapora workflow start --template <name> --context context.json
    • vapora workflow list
    • vapora workflow status <id>
    • vapora workflow approve <id> --approver "Name"
    • vapora workflow cancel <id> --reason "Reason"
    • vapora workflow templates
  5. Kogral Integration:

    • enrich_context_from_kogral() - Loads guidelines, patterns, ADRs
    • Filesystem-based knowledge retrieval from .kogral/ directory
    • Configurable via KOGRAL_PATH environment variable

Integration with Existing Components

Component Usage
SwarmCoordinator Task assignment via submit_task_for_bidding()
AgentRegistry 12 roles with lifecycle management
LearningProfiles Expertise-based agent selection
KGPersistence Workflow execution history
NATS JetStream Inter-stage event coordination

Rationale

Why Vapora Already Has the Pieces

Current Vapora implementation includes:

Component Status Functionality
SwarmCoordinator Complete Task assignment, load balancing
AgentRegistry Complete 12 roles, lifecycle management
Learning Profiles Complete Expertise scoring with recency bias
KG Persistence Complete SurrealDB, execution history
NATS Messaging Complete Inter-agent communication
Workflow Templates Complete workflows.toml with stage definitions
Artifact Types Complete TaskCompleted.artifacts field

What was missing: The orchestration layer that executes workflow templates by loading templates, creating instances, listening for task completions, advancing stages, and passing artifacts.

Why Not Alternative Solutions

Alternative Why Not
Manual /compact in Claude Code Requires user discipline, doesn't fundamentally change pattern
Shorter sessions manually Loses context continuity, user must track state
External tools (LiteLLM, CrewAI) Python-based, doesn't leverage existing Vapora infrastructure
Just use Haiku everywhere Quality degradation for complex tasks

Vapora already has budget-aware routing, learning profiles, and swarm coordination. The workflow orchestrator completes the picture.

Why Kogral Integration

Kogral provides persistent knowledge that would otherwise bloat session context:

Without Kogral With Kogral
Guidelines re-explained each session Query once via MCP, inject 5K tokens
ADRs repeated in conversation Reference by ID, inject summary
Patterns described verbally Structured retrieval, minimal tokens

Kogral transforms "remember our auth pattern" (requires context) into "query pattern:auth" (stateless lookup).

Consequences

Positive

  1. ~95% reduction in cache token costs: $840/month → ~$50-100/month for same workload

  2. Better model allocation: Opus for architecture (high quality, few tokens), Haiku for implementation (lower quality acceptable, many tokens)

  3. Leverages existing investment: Uses SwarmCoordinator, LearningProfiles, KGPersistence already built

  4. Audit trail: Each agent execution persisted to KG with tokens, cost, duration

  5. Parallelization: Multiple developers can work simultaneously on different parts

  6. Quality through specialization: Each agent optimized for its role vs one generalist session

Negative

  1. Orchestration overhead: Additional component to maintain

  2. Latency between stages: Artifact passing adds delay vs continuous conversation

  3. Context loss between agents: Agent B doesn't know what Agent A "considered but rejected"

  4. Debugging complexity: Issues span multiple agent executions

Mitigations

Negative Mitigation
Orchestration overhead Minimal code (~1500 lines), clear separation of concerns
Latency Parallel stages where possible, async execution
Context loss Kogral captures decisions, not just outcomes
Debugging Workflow ID traces all related executions in KG

Metrics for Success

Metric Before After (Target)
Monthly LLM cost ~$840 <$150
Cache tokens per task ~20M <1M
Average context size 500K+ <50K per agent
Workflow completion rate N/A >95%

Cost Projection

Based on analyzed usage patterns with optimized workflow:

Role Model % of Work Monthly Cost
Architect Opus 10% ~$25
Developer Haiku 50% ~$30
Reviewer Sonnet 25% ~$40
Tester Haiku 15% ~$15
Total ~$110

Savings: ~$730/month (87% reduction)

Implementation Status

  • Status: Complete (v1.2.0)
  • Crates: vapora-workflow-engine, vapora-cli
  • Tests: 26 unit tests + 1 doc test passing
  • Endpoints: 6 REST API endpoints
  • Templates: 4 pre-configured workflows
  • CLI Commands: 6 workflow management commands

References

  • Usage data: Claude Code usage analysis (5 weeks, 3.82B cache tokens)
  • Vapora SwarmCoordinator: crates/vapora-swarm/src/coordinator.rs
  • Vapora Workflows Config: config/workflows.toml
  • Kogral MCP: kogral-mcp (external project)
  • Implementation: crates/vapora-workflow-engine/
  • CLI: crates/vapora-cli/
  • ADR-0014: Learning-Based Agent Selection
  • ADR-0015: Budget Enforcement & Cost Optimization
  • ADR-0013: Knowledge Graph for Temporal Execution History
  • ADR-0018: Swarm Load Balancing

Decision Drivers

  1. Data-driven: 95% of cost is cache tokens from long sessions
  2. Infrastructure exists: Vapora has all pieces except orchestrator
  3. Kogral synergy: Persistent knowledge reduces context requirements
  4. Measurable outcome: Clear before/after metrics for validation
  5. Production-ready: Complete implementation with tests and documentation