Vapora/docs/adrs/0028-workflow-orchestrator.md

276 lines
11 KiB
Markdown
Raw Normal View History

# ADR-0028: Workflow Orchestrator for Cost-Efficient Multi-Agent Pipelines
## Status
**Accepted** - Implemented in v1.2.0
## Context
### The Problem: Excessive LLM Costs from Cache Token Accumulation
Analysis of real Claude Code usage data (5 weeks, individual developer) reveals a critical cost pattern:
| Metric | Value |
|--------|-------|
| Total cost | $1,050.68 |
| Weekly average | ~$210 |
| Monthly projection | ~$840 |
| Cache read tokens | 3.82B (95.7% of total) |
| Cache creation tokens | 170M (4.3%) |
| Direct input tokens | 2.4M (0.06%) |
| Direct output tokens | 366K (0.009%) |
**The cost is dominated by cache tokens, not generation.**
### Root Cause: Monolithic Session Pattern
Current workflow with Claude Code follows a monolithic session pattern:
```text
Session start
├─ Message 1: context 50K → cache read 50K
├─ Message 2: context 100K → cache read 100K
├─ Message 3: context 150K → cache read 150K
├─ ...
└─ Message 50: context 800K → cache read 800K
─────────────────
~20M cache reads per session
```
Each message in a long session re-sends the entire conversation history. Over a typical development session (50+ messages), context accumulates to 500K-1M tokens, with each subsequent message re-transmitting all previous context.
### Why This Matters
At current pricing (2026 rates):
- Cache read (Haiku): $0.03/1M tokens
- Cache read (Sonnet): $0.30/1M tokens
- Cache read (Opus): $1.50/1M tokens
With 3.82B cache read tokens distributed across Sonnet (51%) and Haiku (38%), the cache cost alone exceeds what direct input/output would cost.
## Decision
**Implement a Workflow Orchestrator (vapora-workflow-engine) that executes multi-stage pipelines with short-lived agent contexts.**
### Architecture: Agents with Short Lifecycles
Instead of one long session accumulating context, workflows execute as discrete stages:
```text
┌─────────────────────────────────────────────────────────┐
│ Task: "Implement feature X" │
└─────────────────────────────────────────────────────────┘
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌──────────┐
│Architect│ │Developer │ │ Reviewer │
│ (Opus) │ │ (Haiku) │ │ (Sonnet) │
├─────────┤ ├──────────┤ ├──────────┤
│Context: │ │Context: │ │Context: │
│ 40K │───────▶│ 25K │───────▶│ 35K │
│ 5 msgs │ spec │ 12 msgs │ code │ 4 msgs │
│ 200K │ │ 300K │ │ 140K │
│ cache │ │ cache │ │ cache │
└────┬────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
TERMINATES TERMINATES TERMINATES
(context (context (context
discarded) discarded) discarded)
Total cache: ~640K
Monolithic equivalent: ~20-40M
Reduction: 95-97%
```
### Key Principles
1. **Context isolation**: Each agent receives only what it needs (spec, relevant files), not full conversation history
2. **Artifact passing, not conversation passing**: Between agents flows the result (spec, code, review), not the dialogue that produced it
3. **Short lifecycles**: Agent completes task → context dies → next agent starts fresh
4. **Persistent memory via Kogral**: Important decisions/patterns stored in knowledge base, not in session context
## Implementation
### Components
1. **vapora-workflow-engine** (new crate):
- `WorkflowOrchestrator`: Main coordinator managing workflow lifecycle
- `WorkflowInstance`: State machine tracking individual workflow execution
- `StageState`: Manages stage execution and task assignment
- `Artifact`: Data passed between stages (ADR, Code, TestResults, Review, Documentation)
2. **Workflow Templates** (`config/workflows.toml`):
- `feature_development` (5 stages): architecture → implementation → testing → review → deployment
- `bugfix` (4 stages): investigation → fix → testing → deployment
- `documentation_update` (3 stages): content → review → publish
- `security_audit` (4 stages): analysis → pentesting → remediation → verification
3. **REST API** (`/api/v1/workflow_orchestrator`):
- `POST /` - Start workflow
- `GET /` - List active workflows
- `GET /:id` - Get workflow status
- `POST /:id/approve` - Approve waiting stage
- `POST /:id/cancel` - Cancel running workflow
- `GET /templates` - List available templates
4. **CLI** (vapora-cli):
- `vapora workflow start --template <name> --context context.json`
- `vapora workflow list`
- `vapora workflow status <id>`
- `vapora workflow approve <id> --approver "Name"`
- `vapora workflow cancel <id> --reason "Reason"`
- `vapora workflow templates`
5. **Kogral Integration**:
- `enrich_context_from_kogral()` - Loads guidelines, patterns, ADRs
- Filesystem-based knowledge retrieval from `.kogral/` directory
- Configurable via `KOGRAL_PATH` environment variable
### Integration with Existing Components
| Component | Usage |
|-----------|-------|
| SwarmCoordinator | Task assignment via `submit_task_for_bidding()` |
| AgentRegistry | 12 roles with lifecycle management |
| LearningProfiles | Expertise-based agent selection |
| KGPersistence | Workflow execution history |
| NATS JetStream | Inter-stage event coordination |
## Rationale
### Why Vapora Already Has the Pieces
Current Vapora implementation includes:
| Component | Status | Functionality |
|-----------|--------|---------------|
| SwarmCoordinator | Complete | Task assignment, load balancing |
| AgentRegistry | Complete | 12 roles, lifecycle management |
| Learning Profiles | Complete | Expertise scoring with recency bias |
| KG Persistence | Complete | SurrealDB, execution history |
| NATS Messaging | Complete | Inter-agent communication |
| Workflow Templates | Complete | `workflows.toml` with stage definitions |
| Artifact Types | Complete | `TaskCompleted.artifacts` field |
**What was missing**: The orchestration layer that executes workflow templates by loading templates, creating instances, listening for task completions, advancing stages, and passing artifacts.
### Why Not Alternative Solutions
| Alternative | Why Not |
|-------------|---------|
| Manual `/compact` in Claude Code | Requires user discipline, doesn't fundamentally change pattern |
| Shorter sessions manually | Loses context continuity, user must track state |
| External tools (LiteLLM, CrewAI) | Python-based, doesn't leverage existing Vapora infrastructure |
| Just use Haiku everywhere | Quality degradation for complex tasks |
Vapora already has budget-aware routing, learning profiles, and swarm coordination. The workflow orchestrator completes the picture.
### Why Kogral Integration
Kogral provides persistent knowledge that would otherwise bloat session context:
| Without Kogral | With Kogral |
|----------------|-------------|
| Guidelines re-explained each session | Query once via MCP, inject 5K tokens |
| ADRs repeated in conversation | Reference by ID, inject summary |
| Patterns described verbally | Structured retrieval, minimal tokens |
Kogral transforms "remember our auth pattern" (requires context) into "query pattern:auth" (stateless lookup).
## Consequences
### Positive
1. **~95% reduction in cache token costs**: $840/month → ~$50-100/month for same workload
2. **Better model allocation**: Opus for architecture (high quality, few tokens), Haiku for implementation (lower quality acceptable, many tokens)
3. **Leverages existing investment**: Uses SwarmCoordinator, LearningProfiles, KGPersistence already built
4. **Audit trail**: Each agent execution persisted to KG with tokens, cost, duration
5. **Parallelization**: Multiple developers can work simultaneously on different parts
6. **Quality through specialization**: Each agent optimized for its role vs one generalist session
### Negative
1. **Orchestration overhead**: Additional component to maintain
2. **Latency between stages**: Artifact passing adds delay vs continuous conversation
3. **Context loss between agents**: Agent B doesn't know what Agent A "considered but rejected"
4. **Debugging complexity**: Issues span multiple agent executions
### Mitigations
| Negative | Mitigation |
|----------|------------|
| Orchestration overhead | Minimal code (~1500 lines), clear separation of concerns |
| Latency | Parallel stages where possible, async execution |
| Context loss | Kogral captures decisions, not just outcomes |
| Debugging | Workflow ID traces all related executions in KG |
## Metrics for Success
| Metric | Before | After (Target) |
|--------|--------|----------------|
| Monthly LLM cost | ~$840 | <$150 |
| Cache tokens per task | ~20M | <1M |
| Average context size | 500K+ | <50K per agent |
| Workflow completion rate | N/A | >95% |
## Cost Projection
Based on analyzed usage patterns with optimized workflow:
| Role | Model | % of Work | Monthly Cost |
|------|-------|-----------|--------------|
| Architect | Opus | 10% | ~$25 |
| Developer | Haiku | 50% | ~$30 |
| Reviewer | Sonnet | 25% | ~$40 |
| Tester | Haiku | 15% | ~$15 |
| **Total** | | | **~$110** |
**Savings: ~$730/month (87% reduction)**
## Implementation Status
- **Status**: Complete (v1.2.0)
- **Crates**: vapora-workflow-engine, vapora-cli
- **Tests**: 26 unit tests + 1 doc test passing
- **Endpoints**: 6 REST API endpoints
- **Templates**: 4 pre-configured workflows
- **CLI Commands**: 6 workflow management commands
## References
- Usage data: Claude Code usage analysis (5 weeks, 3.82B cache tokens)
- Vapora SwarmCoordinator: `crates/vapora-swarm/src/coordinator.rs`
- Vapora Workflows Config: `config/workflows.toml`
- Kogral MCP: `kogral-mcp` (external project)
- Implementation: `crates/vapora-workflow-engine/`
- CLI: `crates/vapora-cli/`
## Related ADRs
- ADR-0014: Learning-Based Agent Selection
- ADR-0015: Budget Enforcement & Cost Optimization
- ADR-0013: Knowledge Graph for Temporal Execution History
- ADR-0018: Swarm Load Balancing
## Decision Drivers
1. **Data-driven**: 95% of cost is cache tokens from long sessions
2. **Infrastructure exists**: Vapora has all pieces except orchestrator
3. **Kogral synergy**: Persistent knowledge reduces context requirements
4. **Measurable outcome**: Clear before/after metrics for validation
5. **Production-ready**: Complete implementation with tests and documentation