Vapora/docs/adrs/0028-workflow-orchestrator.md

# ADR-0028: Workflow Orchestrator for Cost-Efficient Multi-Agent Pipelines

## Status

**Accepted** - Implemented in v1.2.0

## Context

### The Problem: Excessive LLM Costs from Cache Token Accumulation

Analysis of real Claude Code usage data (5 weeks, individual developer) reveals a critical cost pattern:

| Metric | Value |
|--------|-------|
| Total cost | $1,050.68 |
| Weekly average | ~$210 |
| Monthly projection | ~$840 |
| Cache read tokens | 3.82B (95.7% of total) |
| Cache creation tokens | 170M (4.3%) |
| Direct input tokens | 2.4M (0.06%) |
| Direct output tokens | 366K (0.009%) |

**The cost is dominated by cache tokens, not generation.**

### Root Cause: Monolithic Session Pattern

Current workflow with Claude Code follows a monolithic session pattern:

```text
Session start
├─ Message 1:  context 50K   → cache read 50K
├─ Message 2:  context 100K  → cache read 100K
├─ Message 3:  context 150K  → cache read 150K
├─ ...
└─ Message 50: context 800K  → cache read 800K
                             ─────────────────
                             ~20M cache reads per session
```

Each message in a long session re-sends the entire conversation history. Over a typical development session (50+ messages), context accumulates to 500K-1M tokens, with each subsequent message re-transmitting all previous context.

### Why This Matters

At current pricing (2026 rates):

- Cache read (Haiku): $0.03/1M tokens
- Cache read (Sonnet): $0.30/1M tokens
- Cache read (Opus): $1.50/1M tokens

With 3.82B cache read tokens distributed across Sonnet (51%) and Haiku (38%), the cache cost alone exceeds what direct input/output would cost.

## Decision

**Implement a Workflow Orchestrator (vapora-workflow-engine) that executes multi-stage pipelines with short-lived agent contexts.**

### Architecture: Agents with Short Lifecycles

Instead of one long session accumulating context, workflows execute as discrete stages:

```text
┌─────────────────────────────────────────────────────────┐
│ Task: "Implement feature X"                              │
└─────────────────────────────────────────────────────────┘
                         │
    ┌────────────────────┼────────────────────┐
    ▼                    ▼                    ▼
┌─────────┐        ┌──────────┐        ┌──────────┐
│Architect│        │Developer │        │ Reviewer │
│ (Opus)  │        │ (Haiku)  │        │ (Sonnet) │
├─────────┤        ├──────────┤        ├──────────┤
│Context: │        │Context:  │        │Context:  │
│ 40K     │───────▶│ 25K      │───────▶│ 35K      │
│ 5 msgs  │ spec   │ 12 msgs  │ code   │ 4 msgs   │
│ 200K    │        │ 300K     │        │ 140K     │
│ cache   │        │ cache    │        │ cache    │
└────┬────┘        └────┬─────┘        └────┬─────┘
     │                  │                   │
     ▼                  ▼                   ▼
  TERMINATES         TERMINATES          TERMINATES
  (context           (context            (context
   discarded)         discarded)          discarded)

Total cache: ~640K
Monolithic equivalent: ~20-40M
Reduction: 95-97%
```

### Key Principles

1. **Context isolation**: Each agent receives only what it needs (spec, relevant files), not full conversation history

2. **Artifact passing, not conversation passing**: Between agents flows the result (spec, code, review), not the dialogue that produced it

3. **Short lifecycles**: Agent completes task → context dies → next agent starts fresh

4. **Persistent memory via Kogral**: Important decisions/patterns stored in knowledge base, not in session context

## Implementation

### Components

1. **vapora-workflow-engine** (new crate):
   - `WorkflowOrchestrator`: Main coordinator managing workflow lifecycle
   - `WorkflowInstance`: State machine tracking individual workflow execution
   - `StageState`: Manages stage execution and task assignment
   - `Artifact`: Data passed between stages (ADR, Code, TestResults, Review, Documentation)

2. **Workflow Templates** (`config/workflows.toml`):
   - `feature_development` (5 stages): architecture → implementation → testing → review → deployment
   - `bugfix` (4 stages): investigation → fix → testing → deployment
   - `documentation_update` (3 stages): content → review → publish
   - `security_audit` (4 stages): analysis → pentesting → remediation → verification

3. **REST API** (`/api/v1/workflow_orchestrator`):
   - `POST /` - Start workflow
   - `GET /` - List active workflows
   - `GET /:id` - Get workflow status
   - `POST /:id/approve` - Approve waiting stage
   - `POST /:id/cancel` - Cancel running workflow
   - `GET /templates` - List available templates

4. **CLI** (vapora-cli):
   - `vapora workflow start --template <name> --context context.json`
   - `vapora workflow list`
   - `vapora workflow status <id>`
   - `vapora workflow approve <id> --approver "Name"`
   - `vapora workflow cancel <id> --reason "Reason"`
   - `vapora workflow templates`

5. **Kogral Integration**:
   - `enrich_context_from_kogral()` - Loads guidelines, patterns, ADRs
   - Filesystem-based knowledge retrieval from `.kogral/` directory
   - Configurable via `KOGRAL_PATH` environment variable

### Integration with Existing Components

| Component | Usage |
|-----------|-------|
| SwarmCoordinator | Task assignment via `submit_task_for_bidding()` |
| AgentRegistry | 12 roles with lifecycle management |
| LearningProfiles | Expertise-based agent selection |
| KGPersistence | Workflow execution history |
| NATS JetStream | Inter-stage event coordination |

## Rationale

### Why Vapora Already Has the Pieces

Current Vapora implementation includes:

| Component | Status | Functionality |
|-----------|--------|---------------|
| SwarmCoordinator | Complete | Task assignment, load balancing |
| AgentRegistry | Complete | 12 roles, lifecycle management |
| Learning Profiles | Complete | Expertise scoring with recency bias |
| KG Persistence | Complete | SurrealDB, execution history |
| NATS Messaging | Complete | Inter-agent communication |
| Workflow Templates | Complete | `workflows.toml` with stage definitions |
| Artifact Types | Complete | `TaskCompleted.artifacts` field |

**What was missing**: The orchestration layer that executes workflow templates by loading templates, creating instances, listening for task completions, advancing stages, and passing artifacts.

### Why Not Alternative Solutions

| Alternative | Why Not |
|-------------|---------|
| Manual `/compact` in Claude Code | Requires user discipline, doesn't fundamentally change pattern |
| Shorter sessions manually | Loses context continuity, user must track state |
| External tools (LiteLLM, CrewAI) | Python-based, doesn't leverage existing Vapora infrastructure |
| Just use Haiku everywhere | Quality degradation for complex tasks |

Vapora already has budget-aware routing, learning profiles, and swarm coordination. The workflow orchestrator completes the picture.

### Why Kogral Integration

Kogral provides persistent knowledge that would otherwise bloat session context:

| Without Kogral | With Kogral |
|----------------|-------------|
| Guidelines re-explained each session | Query once via MCP, inject 5K tokens |
| ADRs repeated in conversation | Reference by ID, inject summary |
| Patterns described verbally | Structured retrieval, minimal tokens |

Kogral transforms "remember our auth pattern" (requires context) into "query pattern:auth" (stateless lookup).

## Consequences

### Positive

1. **~95% reduction in cache token costs**: $840/month → ~$50-100/month for same workload

2. **Better model allocation**: Opus for architecture (high quality, few tokens), Haiku for implementation (lower quality acceptable, many tokens)

3. **Leverages existing investment**: Uses SwarmCoordinator, LearningProfiles, KGPersistence already built

4. **Audit trail**: Each agent execution persisted to KG with tokens, cost, duration

5. **Parallelization**: Multiple developers can work simultaneously on different parts

6. **Quality through specialization**: Each agent optimized for its role vs one generalist session

### Negative

1. **Orchestration overhead**: Additional component to maintain

2. **Latency between stages**: Artifact passing adds delay vs continuous conversation

3. **Context loss between agents**: Agent B doesn't know what Agent A "considered but rejected"

4. **Debugging complexity**: Issues span multiple agent executions

### Mitigations

| Negative | Mitigation |
|----------|------------|
| Orchestration overhead | Minimal code (~1500 lines), clear separation of concerns |
| Latency | Parallel stages where possible, async execution |
| Context loss | Kogral captures decisions, not just outcomes |
| Debugging | Workflow ID traces all related executions in KG |

## Metrics for Success

| Metric | Before | After (Target) |
|--------|--------|----------------|
| Monthly LLM cost | ~$840 | <$150 |
| Cache tokens per task | ~20M | <1M |
| Average context size | 500K+ | <50K per agent |
| Workflow completion rate | N/A | >95% |

## Cost Projection

Based on analyzed usage patterns with optimized workflow:

| Role | Model | % of Work | Monthly Cost |
|------|-------|-----------|--------------|
| Architect | Opus | 10% | ~$25 |
| Developer | Haiku | 50% | ~$30 |
| Reviewer | Sonnet | 25% | ~$40 |
| Tester | Haiku | 15% | ~$15 |
| **Total** | | | **~$110** |

**Savings: ~$730/month (87% reduction)**

## Implementation Status

- **Status**: Complete (v1.2.0)
- **Crates**: vapora-workflow-engine, vapora-cli
- **Tests**: 26 unit tests + 1 doc test passing
- **Endpoints**: 6 REST API endpoints
- **Templates**: 4 pre-configured workflows
- **CLI Commands**: 6 workflow management commands

## References

- Usage data: Claude Code usage analysis (5 weeks, 3.82B cache tokens)
- Vapora SwarmCoordinator: `crates/vapora-swarm/src/coordinator.rs`
- Vapora Workflows Config: `config/workflows.toml`
- Kogral MCP: `kogral-mcp` (external project)
- Implementation: `crates/vapora-workflow-engine/`
- CLI: `crates/vapora-cli/`

## Related ADRs

- ADR-0014: Learning-Based Agent Selection
- ADR-0015: Budget Enforcement & Cost Optimization
- ADR-0013: Knowledge Graph for Temporal Execution History
- ADR-0018: Swarm Load Balancing

## Decision Drivers

1. **Data-driven**: 95% of cost is cache tokens from long sessions
2. **Infrastructure exists**: Vapora has all pieces except orchestrator
3. **Kogral synergy**: Persistent knowledge reduces context requirements
4. **Measurable outcome**: Clear before/after metrics for validation
5. **Production-ready**: Complete implementation with tests and documentation
chore: update README and CHANGELOG with workflow orchestrator features 2026-01-24 02:07:45 +00:00			`# ADR-0028: Workflow Orchestrator for Cost-Efficient Multi-Agent Pipelines`

			`## Status`

			`Accepted - Implemented in v1.2.0`

			`## Context`

			`### The Problem: Excessive LLM Costs from Cache Token Accumulation`

			`Analysis of real Claude Code usage data (5 weeks, individual developer) reveals a critical cost pattern:`

			`\| Metric \| Value \|`
			`\|--------\|-------\|`
			`\| Total cost \| $1,050.68 \|`
			`\| Weekly average \| ~$210 \|`
			`\| Monthly projection \| ~$840 \|`
			`\| Cache read tokens \| 3.82B (95.7% of total) \|`
			`\| Cache creation tokens \| 170M (4.3%) \|`
			`\| Direct input tokens \| 2.4M (0.06%) \|`
			`\| Direct output tokens \| 366K (0.009%) \|`

			`The cost is dominated by cache tokens, not generation.`

			`### Root Cause: Monolithic Session Pattern`

			`Current workflow with Claude Code follows a monolithic session pattern:`

			```text
			`Session start`
			`├─ Message 1: context 50K → cache read 50K`
			`├─ Message 2: context 100K → cache read 100K`
			`├─ Message 3: context 150K → cache read 150K`
			`├─ ...`
			`└─ Message 50: context 800K → cache read 800K`
			`─────────────────`
			`~20M cache reads per session`
			```

			`Each message in a long session re-sends the entire conversation history. Over a typical development session (50+ messages), context accumulates to 500K-1M tokens, with each subsequent message re-transmitting all previous context.`

			`### Why This Matters`

			`At current pricing (2026 rates):`

			`- Cache read (Haiku): $0.03/1M tokens`
			`- Cache read (Sonnet): $0.30/1M tokens`
			`- Cache read (Opus): $1.50/1M tokens`

			`With 3.82B cache read tokens distributed across Sonnet (51%) and Haiku (38%), the cache cost alone exceeds what direct input/output would cost.`

			`## Decision`

			`Implement a Workflow Orchestrator (vapora-workflow-engine) that executes multi-stage pipelines with short-lived agent contexts.`

			`### Architecture: Agents with Short Lifecycles`

			`Instead of one long session accumulating context, workflows execute as discrete stages:`

			```text
			`┌─────────────────────────────────────────────────────────┐`
			`│ Task: "Implement feature X" │`
			`└─────────────────────────────────────────────────────────┘`
			`│`
			`┌────────────────────┼────────────────────┐`
			`▼ ▼ ▼`
			`┌─────────┐ ┌──────────┐ ┌──────────┐`
			`│Architect│ │Developer │ │ Reviewer │`
			`│ (Opus) │ │ (Haiku) │ │ (Sonnet) │`
			`├─────────┤ ├──────────┤ ├──────────┤`
			`│Context: │ │Context: │ │Context: │`
			`│ 40K │───────▶│ 25K │───────▶│ 35K │`
			`│ 5 msgs │ spec │ 12 msgs │ code │ 4 msgs │`
			`│ 200K │ │ 300K │ │ 140K │`
			`│ cache │ │ cache │ │ cache │`
			`└────┬────┘ └────┬─────┘ └────┬─────┘`
			`│ │ │`
			`▼ ▼ ▼`
			`TERMINATES TERMINATES TERMINATES`
			`(context (context (context`
			`discarded) discarded) discarded)`

			`Total cache: ~640K`
			`Monolithic equivalent: ~20-40M`
			`Reduction: 95-97%`
			```

			`### Key Principles`

			`1. Context isolation: Each agent receives only what it needs (spec, relevant files), not full conversation history`

			`2. Artifact passing, not conversation passing: Between agents flows the result (spec, code, review), not the dialogue that produced it`

			`3. Short lifecycles: Agent completes task → context dies → next agent starts fresh`

			`4. Persistent memory via Kogral: Important decisions/patterns stored in knowledge base, not in session context`

			`## Implementation`

			`### Components`

			`1. vapora-workflow-engine (new crate):`
			- `WorkflowOrchestrator`: Main coordinator managing workflow lifecycle
			- `WorkflowInstance`: State machine tracking individual workflow execution
			- `StageState`: Manages stage execution and task assignment
			- `Artifact`: Data passed between stages (ADR, Code, TestResults, Review, Documentation)

			2. Workflow Templates (`config/workflows.toml`):
			- `feature_development` (5 stages): architecture → implementation → testing → review → deployment
			- `bugfix` (4 stages): investigation → fix → testing → deployment
			- `documentation_update` (3 stages): content → review → publish
			- `security_audit` (4 stages): analysis → pentesting → remediation → verification

			3. REST API (`/api/v1/workflow_orchestrator`):
			- `POST /` - Start workflow
			- `GET /` - List active workflows
			- `GET /:id` - Get workflow status
			- `POST /:id/approve` - Approve waiting stage
			- `POST /:id/cancel` - Cancel running workflow
			- `GET /templates` - List available templates

			`4. CLI (vapora-cli):`
			- `vapora workflow start --template <name> --context context.json`
			- `vapora workflow list`
			- `vapora workflow status <id>`
			- `vapora workflow approve <id> --approver "Name"`
			- `vapora workflow cancel <id> --reason "Reason"`
			- `vapora workflow templates`

			`5. Kogral Integration:`
			- `enrich_context_from_kogral()` - Loads guidelines, patterns, ADRs
			- Filesystem-based knowledge retrieval from `.kogral/` directory
			- Configurable via `KOGRAL_PATH` environment variable

			`### Integration with Existing Components`

			`\| Component \| Usage \|`
			`\|-----------\|-------\|`
			\| SwarmCoordinator \| Task assignment via `submit_task_for_bidding()` \|
			`\| AgentRegistry \| 12 roles with lifecycle management \|`
			`\| LearningProfiles \| Expertise-based agent selection \|`
			`\| KGPersistence \| Workflow execution history \|`
			`\| NATS JetStream \| Inter-stage event coordination \|`

			`## Rationale`

			`### Why Vapora Already Has the Pieces`

			`Current Vapora implementation includes:`

			`\| Component \| Status \| Functionality \|`
			`\|-----------\|--------\|---------------\|`
			`\| SwarmCoordinator \| Complete \| Task assignment, load balancing \|`
			`\| AgentRegistry \| Complete \| 12 roles, lifecycle management \|`
			`\| Learning Profiles \| Complete \| Expertise scoring with recency bias \|`
			`\| KG Persistence \| Complete \| SurrealDB, execution history \|`
			`\| NATS Messaging \| Complete \| Inter-agent communication \|`
			\| Workflow Templates \| Complete \| `workflows.toml` with stage definitions \|
			\| Artifact Types \| Complete \| `TaskCompleted.artifacts` field \|

			`What was missing: The orchestration layer that executes workflow templates by loading templates, creating instances, listening for task completions, advancing stages, and passing artifacts.`

			`### Why Not Alternative Solutions`

			`\| Alternative \| Why Not \|`
			`\|-------------\|---------\|`
			\| Manual `/compact` in Claude Code \| Requires user discipline, doesn't fundamentally change pattern \|
			`\| Shorter sessions manually \| Loses context continuity, user must track state \|`
			`\| External tools (LiteLLM, CrewAI) \| Python-based, doesn't leverage existing Vapora infrastructure \|`
			`\| Just use Haiku everywhere \| Quality degradation for complex tasks \|`

			`Vapora already has budget-aware routing, learning profiles, and swarm coordination. The workflow orchestrator completes the picture.`

			`### Why Kogral Integration`

			`Kogral provides persistent knowledge that would otherwise bloat session context:`

			`\| Without Kogral \| With Kogral \|`
			`\|----------------\|-------------\|`
			`\| Guidelines re-explained each session \| Query once via MCP, inject 5K tokens \|`
			`\| ADRs repeated in conversation \| Reference by ID, inject summary \|`
			`\| Patterns described verbally \| Structured retrieval, minimal tokens \|`

			`Kogral transforms "remember our auth pattern" (requires context) into "query pattern:auth" (stateless lookup).`

			`## Consequences`

			`### Positive`

			`1. ~95% reduction in cache token costs: $840/month → ~$50-100/month for same workload`

			`2. Better model allocation: Opus for architecture (high quality, few tokens), Haiku for implementation (lower quality acceptable, many tokens)`

			`3. Leverages existing investment: Uses SwarmCoordinator, LearningProfiles, KGPersistence already built`

			`4. Audit trail: Each agent execution persisted to KG with tokens, cost, duration`

			`5. Parallelization: Multiple developers can work simultaneously on different parts`

			`6. Quality through specialization: Each agent optimized for its role vs one generalist session`

			`### Negative`

			`1. Orchestration overhead: Additional component to maintain`

			`2. Latency between stages: Artifact passing adds delay vs continuous conversation`

			`3. Context loss between agents: Agent B doesn't know what Agent A "considered but rejected"`

			`4. Debugging complexity: Issues span multiple agent executions`

			`### Mitigations`

			`\| Negative \| Mitigation \|`
			`\|----------\|------------\|`
			`\| Orchestration overhead \| Minimal code (~1500 lines), clear separation of concerns \|`
			`\| Latency \| Parallel stages where possible, async execution \|`
			`\| Context loss \| Kogral captures decisions, not just outcomes \|`
			`\| Debugging \| Workflow ID traces all related executions in KG \|`

			`## Metrics for Success`

			`\| Metric \| Before \| After (Target) \|`
			`\|--------\|--------\|----------------\|`
			`\| Monthly LLM cost \| ~$840 \| <$150 \|`
			`\| Cache tokens per task \| ~20M \| <1M \|`
			`\| Average context size \| 500K+ \| <50K per agent \|`
			`\| Workflow completion rate \| N/A \| >95% \|`

			`## Cost Projection`

			`Based on analyzed usage patterns with optimized workflow:`

			`\| Role \| Model \| % of Work \| Monthly Cost \|`
			`\|------\|-------\|-----------\|--------------\|`
			`\| Architect \| Opus \| 10% \| ~$25 \|`
			`\| Developer \| Haiku \| 50% \| ~$30 \|`
			`\| Reviewer \| Sonnet \| 25% \| ~$40 \|`
			`\| Tester \| Haiku \| 15% \| ~$15 \|`
			`\| Total \| \| \| ~$110 \|`

			`Savings: ~$730/month (87% reduction)`

			`## Implementation Status`

			`- Status: Complete (v1.2.0)`
			`- Crates: vapora-workflow-engine, vapora-cli`
			`- Tests: 26 unit tests + 1 doc test passing`
			`- Endpoints: 6 REST API endpoints`
			`- Templates: 4 pre-configured workflows`
			`- CLI Commands: 6 workflow management commands`

			`## References`

			`- Usage data: Claude Code usage analysis (5 weeks, 3.82B cache tokens)`
			- Vapora SwarmCoordinator: `crates/vapora-swarm/src/coordinator.rs`
			- Vapora Workflows Config: `config/workflows.toml`
			- Kogral MCP: `kogral-mcp` (external project)
			- Implementation: `crates/vapora-workflow-engine/`
			- CLI: `crates/vapora-cli/`

			`## Related ADRs`

			`- ADR-0014: Learning-Based Agent Selection`
			`- ADR-0015: Budget Enforcement & Cost Optimization`
			`- ADR-0013: Knowledge Graph for Temporal Execution History`
			`- ADR-0018: Swarm Load Balancing`

			`## Decision Drivers`

			`1. Data-driven: 95% of cost is cache tokens from long sessions`
			`2. Infrastructure exists: Vapora has all pieces except orchestrator`
			`3. Kogral synergy: Persistent knowledge reduces context requirements`
			`4. Measurable outcome: Clear before/after metrics for validation`
			`5. Production-ready: Complete implementation with tests and documentation`