Vapora/docs/adrs/0028-workflow-orchestrator.md

# ADR-0028: Workflow Orchestrator for Cost-Efficient Multi-Agent Pipelines

## Status

**Accepted** - Implemented in v1.2.0

## Context

### The Problem: Excessive LLM Costs from Cache Token Accumulation

Analysis of real Claude Code usage data (5 weeks, individual developer) reveals a critical cost pattern:

| Metric | Value |
|--------|-------|
| Total cost | $1,050.68 |
| Weekly average | ~$210 |
| Monthly projection | ~$840 |
| Cache read tokens | 3.82B (95.7% of total) |
| Cache creation tokens | 170M (4.3%) |
| Direct input tokens | 2.4M (0.06%) |
| Direct output tokens | 366K (0.009%) |

**The cost is dominated by cache tokens, not generation.**

### Root Cause: Monolithic Session Pattern

Current workflow with Claude Code follows a monolithic session pattern:

```text
Session start
├─ Message 1:  context 50K   → cache read 50K
├─ Message 2:  context 100K  → cache read 100K
├─ Message 3:  context 150K  → cache read 150K
├─ ...
└─ Message 50: context 800K  → cache read 800K
                             ─────────────────
                             ~20M cache reads per session
```

Each message in a long session re-sends the entire conversation history. Over a typical development session (50+ messages), context accumulates to 500K-1M tokens, with each subsequent message re-transmitting all previous context.

### Why This Matters

At current pricing (2026 rates):

- Cache read (Haiku): $0.03/1M tokens
- Cache read (Sonnet): $0.30/1M tokens
- Cache read (Opus): $1.50/1M tokens

With 3.82B cache read tokens distributed across Sonnet (51%) and Haiku (38%), the cache cost alone exceeds what direct input/output would cost.

## Decision

**Implement a Workflow Orchestrator (vapora-workflow-engine) that executes multi-stage pipelines with short-lived agent contexts.**

### Architecture: Agents with Short Lifecycles

Instead of one long session accumulating context, workflows execute as discrete stages:

```text
┌─────────────────────────────────────────────────────────┐
│ Task: "Implement feature X"                              │
└─────────────────────────────────────────────────────────┘
                         │
    ┌────────────────────┼────────────────────┐
    ▼                    ▼                    ▼
┌─────────┐        ┌──────────┐        ┌──────────┐
│Architect│        │Developer │        │ Reviewer │
│ (Opus)  │        │ (Haiku)  │        │ (Sonnet) │
├─────────┤        ├──────────┤        ├──────────┤
│Context: │        │Context:  │        │Context:  │
│ 40K     │───────▶│ 25K      │───────▶│ 35K      │
│ 5 msgs  │ spec   │ 12 msgs  │ code   │ 4 msgs   │
│ 200K    │        │ 300K     │        │ 140K     │
│ cache   │        │ cache    │        │ cache    │
└────┬────┘        └────┬─────┘        └────┬─────┘
     │                  │                   │
     ▼                  ▼                   ▼
  TERMINATES         TERMINATES          TERMINATES
  (context           (context            (context
   discarded)         discarded)          discarded)

Total cache: ~640K
Monolithic equivalent: ~20-40M
Reduction: 95-97%
```

### Key Principles

1. **Context isolation**: Each agent receives only what it needs (spec, relevant files), not full conversation history

2. **Artifact passing, not conversation passing**: Between agents flows the result (spec, code, review), not the dialogue that produced it

3. **Short lifecycles**: Agent completes task → context dies → next agent starts fresh

4. **Persistent memory via Kogral**: Important decisions/patterns stored in knowledge base, not in session context

## Implementation

### Components

1. **vapora-workflow-engine** (new crate):
   - `WorkflowOrchestrator`: Main coordinator managing workflow lifecycle
   - `WorkflowInstance`: State machine tracking individual workflow execution
   - `StageState`: Manages stage execution and task assignment
   - `Artifact`: Data passed between stages (ADR, Code, TestResults, Review, Documentation)

2. **Workflow Templates** (`config/workflows.toml`):
   - `feature_development` (5 stages): architecture → implementation → testing → review → deployment
   - `bugfix` (4 stages): investigation → fix → testing → deployment
   - `documentation_update` (3 stages): content → review → publish
   - `security_audit` (4 stages): analysis → pentesting → remediation → verification

3. **REST API** (`/api/v1/workflow_orchestrator`):
   - `POST /` - Start workflow
   - `GET /` - List active workflows
   - `GET /:id` - Get workflow status
   - `POST /:id/approve` - Approve waiting stage
   - `POST /:id/cancel` - Cancel running workflow
   - `GET /templates` - List available templates

4. **CLI** (vapora-cli):
   - `vapora workflow start --template <name> --context context.json`
   - `vapora workflow list`
   - `vapora workflow status <id>`
   - `vapora workflow approve <id> --approver "Name"`
   - `vapora workflow cancel <id> --reason "Reason"`
   - `vapora workflow templates`

5. **Kogral Integration**:
   - `enrich_context_from_kogral()` - Loads guidelines, patterns, ADRs
   - Filesystem-based knowledge retrieval from `.kogral/` directory
   - Configurable via `KOGRAL_PATH` environment variable

### Integration with Existing Components

| Component | Usage |
|-----------|-------|
| SwarmCoordinator | Task assignment via `submit_task_for_bidding()` |
| AgentRegistry | 12 roles with lifecycle management |
| LearningProfiles | Expertise-based agent selection |
| KGPersistence | Workflow execution history |
| NATS JetStream | Inter-stage event coordination |

## Rationale

### Why Vapora Already Has the Pieces

Current Vapora implementation includes:

| Component | Status | Functionality |
|-----------|--------|---------------|
| SwarmCoordinator | Complete | Task assignment, load balancing |
| AgentRegistry | Complete | 12 roles, lifecycle management |
| Learning Profiles | Complete | Expertise scoring with recency bias |
| KG Persistence | Complete | SurrealDB, execution history |
| NATS Messaging | Complete | Inter-agent communication |
| Workflow Templates | Complete | `workflows.toml` with stage definitions |
| Artifact Types | Complete | `TaskCompleted.artifacts` field |

**What was missing**: The orchestration layer that executes workflow templates by loading templates, creating instances, listening for task completions, advancing stages, and passing artifacts.

### Why Not Alternative Solutions

| Alternative | Why Not |
|-------------|---------|
| Manual `/compact` in Claude Code | Requires user discipline, doesn't fundamentally change pattern |
| Shorter sessions manually | Loses context continuity, user must track state |
| External tools (LiteLLM, CrewAI) | Python-based, doesn't leverage existing Vapora infrastructure |
| Just use Haiku everywhere | Quality degradation for complex tasks |

Vapora already has budget-aware routing, learning profiles, and swarm coordination. The workflow orchestrator completes the picture.

### Why Kogral Integration

Kogral provides persistent knowledge that would otherwise bloat session context:

| Without Kogral | With Kogral |
|----------------|-------------|
| Guidelines re-explained each session | Query once via MCP, inject 5K tokens |
| ADRs repeated in conversation | Reference by ID, inject summary |
| Patterns described verbally | Structured retrieval, minimal tokens |

Kogral transforms "remember our auth pattern" (requires context) into "query pattern:auth" (stateless lookup).

## Consequences

### Positive

1. **~95% reduction in cache token costs**: $840/month → ~$50-100/month for same workload

2. **Better model allocation**: Opus for architecture (high quality, few tokens), Haiku for implementation (lower quality acceptable, many tokens)

3. **Leverages existing investment**: Uses SwarmCoordinator, LearningProfiles, KGPersistence already built

4. **Audit trail**: Each agent execution persisted to KG with tokens, cost, duration

5. **Parallelization**: Multiple developers can work simultaneously on different parts

6. **Quality through specialization**: Each agent optimized for its role vs one generalist session

### Negative

1. **Orchestration overhead**: Additional component to maintain

2. **Latency between stages**: Artifact passing adds delay vs continuous conversation

3. **Context loss between agents**: Agent B doesn't know what Agent A "considered but rejected"

4. **Debugging complexity**: Issues span multiple agent executions

### Mitigations

| Negative | Mitigation |
|----------|------------|
| Orchestration overhead | Minimal code (~1500 lines), clear separation of concerns |
| Latency | Parallel stages where possible, async execution |
| Context loss | Kogral captures decisions, not just outcomes |
| Debugging | Workflow ID traces all related executions in KG |

## Metrics for Success

| Metric | Before | After (Target) |
|--------|--------|----------------|
| Monthly LLM cost | ~$840 | <$150 |
| Cache tokens per task | ~20M | <1M |
| Average context size | 500K+ | <50K per agent |
| Workflow completion rate | N/A | >95% |

## Cost Projection

Based on analyzed usage patterns with optimized workflow:

| Role | Model | % of Work | Monthly Cost |
|------|-------|-----------|--------------|
| Architect | Opus | 10% | ~$25 |
| Developer | Haiku | 50% | ~$30 |
| Reviewer | Sonnet | 25% | ~$40 |
| Tester | Haiku | 15% | ~$15 |
| **Total** | | | **~$110** |

**Savings: ~$730/month (87% reduction)**

## Implementation Status

- **Status**: Complete (v1.2.0)
- **Crates**: vapora-workflow-engine, vapora-cli
- **Tests**: 26 unit tests + 1 doc test passing
- **Endpoints**: 6 REST API endpoints
- **Templates**: 4 pre-configured workflows
- **CLI Commands**: 6 workflow management commands

## References

- Usage data: Claude Code usage analysis (5 weeks, 3.82B cache tokens)
- Vapora SwarmCoordinator: `crates/vapora-swarm/src/coordinator.rs`
- Vapora Workflows Config: `config/workflows.toml`
- Kogral MCP: `kogral-mcp` (external project)
- Implementation: `crates/vapora-workflow-engine/`
- CLI: `crates/vapora-cli/`

## Related ADRs

- ADR-0014: Learning-Based Agent Selection
- ADR-0015: Budget Enforcement & Cost Optimization
- ADR-0013: Knowledge Graph for Temporal Execution History
- ADR-0018: Swarm Load Balancing

## Decision Drivers

1. **Data-driven**: 95% of cost is cache tokens from long sessions
2. **Infrastructure exists**: Vapora has all pieces except orchestrator
3. **Kogral synergy**: Persistent knowledge reduces context requirements
4. **Measurable outcome**: Clear before/after metrics for validation
5. **Production-ready**: Complete implementation with tests and documentation