Some checks failed
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Nickel Type Check / Nickel Type Checking (push) Has been cancelled
276 lines
11 KiB
Markdown
276 lines
11 KiB
Markdown
# ADR-0028: Workflow Orchestrator for Cost-Efficient Multi-Agent Pipelines
|
|
|
|
## Status
|
|
|
|
**Accepted** - Implemented in v1.2.0
|
|
|
|
## Context
|
|
|
|
### The Problem: Excessive LLM Costs from Cache Token Accumulation
|
|
|
|
Analysis of real Claude Code usage data (5 weeks, individual developer) reveals a critical cost pattern:
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Total cost | $1,050.68 |
|
|
| Weekly average | ~$210 |
|
|
| Monthly projection | ~$840 |
|
|
| Cache read tokens | 3.82B (95.7% of total) |
|
|
| Cache creation tokens | 170M (4.3%) |
|
|
| Direct input tokens | 2.4M (0.06%) |
|
|
| Direct output tokens | 366K (0.009%) |
|
|
|
|
**The cost is dominated by cache tokens, not generation.**
|
|
|
|
### Root Cause: Monolithic Session Pattern
|
|
|
|
Current workflow with Claude Code follows a monolithic session pattern:
|
|
|
|
```text
|
|
Session start
|
|
├─ Message 1: context 50K → cache read 50K
|
|
├─ Message 2: context 100K → cache read 100K
|
|
├─ Message 3: context 150K → cache read 150K
|
|
├─ ...
|
|
└─ Message 50: context 800K → cache read 800K
|
|
─────────────────
|
|
~20M cache reads per session
|
|
```
|
|
|
|
Each message in a long session re-sends the entire conversation history. Over a typical development session (50+ messages), context accumulates to 500K-1M tokens, with each subsequent message re-transmitting all previous context.
|
|
|
|
### Why This Matters
|
|
|
|
At current pricing (2026 rates):
|
|
|
|
- Cache read (Haiku): $0.03/1M tokens
|
|
- Cache read (Sonnet): $0.30/1M tokens
|
|
- Cache read (Opus): $1.50/1M tokens
|
|
|
|
With 3.82B cache read tokens distributed across Sonnet (51%) and Haiku (38%), the cache cost alone exceeds what direct input/output would cost.
|
|
|
|
## Decision
|
|
|
|
**Implement a Workflow Orchestrator (vapora-workflow-engine) that executes multi-stage pipelines with short-lived agent contexts.**
|
|
|
|
### Architecture: Agents with Short Lifecycles
|
|
|
|
Instead of one long session accumulating context, workflows execute as discrete stages:
|
|
|
|
```text
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Task: "Implement feature X" │
|
|
└─────────────────────────────────────────────────────────┘
|
|
│
|
|
┌────────────────────┼────────────────────┐
|
|
▼ ▼ ▼
|
|
┌─────────┐ ┌──────────┐ ┌──────────┐
|
|
│Architect│ │Developer │ │ Reviewer │
|
|
│ (Opus) │ │ (Haiku) │ │ (Sonnet) │
|
|
├─────────┤ ├──────────┤ ├──────────┤
|
|
│Context: │ │Context: │ │Context: │
|
|
│ 40K │───────▶│ 25K │───────▶│ 35K │
|
|
│ 5 msgs │ spec │ 12 msgs │ code │ 4 msgs │
|
|
│ 200K │ │ 300K │ │ 140K │
|
|
│ cache │ │ cache │ │ cache │
|
|
└────┬────┘ └────┬─────┘ └────┬─────┘
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
TERMINATES TERMINATES TERMINATES
|
|
(context (context (context
|
|
discarded) discarded) discarded)
|
|
|
|
Total cache: ~640K
|
|
Monolithic equivalent: ~20-40M
|
|
Reduction: 95-97%
|
|
```
|
|
|
|
### Key Principles
|
|
|
|
1. **Context isolation**: Each agent receives only what it needs (spec, relevant files), not full conversation history
|
|
|
|
2. **Artifact passing, not conversation passing**: Between agents flows the result (spec, code, review), not the dialogue that produced it
|
|
|
|
3. **Short lifecycles**: Agent completes task → context dies → next agent starts fresh
|
|
|
|
4. **Persistent memory via Kogral**: Important decisions/patterns stored in knowledge base, not in session context
|
|
|
|
## Implementation
|
|
|
|
### Components
|
|
|
|
1. **vapora-workflow-engine** (new crate):
|
|
- `WorkflowOrchestrator`: Main coordinator managing workflow lifecycle
|
|
- `WorkflowInstance`: State machine tracking individual workflow execution
|
|
- `StageState`: Manages stage execution and task assignment
|
|
- `Artifact`: Data passed between stages (ADR, Code, TestResults, Review, Documentation)
|
|
|
|
2. **Workflow Templates** (`config/workflows.toml`):
|
|
- `feature_development` (5 stages): architecture → implementation → testing → review → deployment
|
|
- `bugfix` (4 stages): investigation → fix → testing → deployment
|
|
- `documentation_update` (3 stages): content → review → publish
|
|
- `security_audit` (4 stages): analysis → pentesting → remediation → verification
|
|
|
|
3. **REST API** (`/api/v1/workflow_orchestrator`):
|
|
- `POST /` - Start workflow
|
|
- `GET /` - List active workflows
|
|
- `GET /:id` - Get workflow status
|
|
- `POST /:id/approve` - Approve waiting stage
|
|
- `POST /:id/cancel` - Cancel running workflow
|
|
- `GET /templates` - List available templates
|
|
|
|
4. **CLI** (vapora-cli):
|
|
- `vapora workflow start --template <name> --context context.json`
|
|
- `vapora workflow list`
|
|
- `vapora workflow status <id>`
|
|
- `vapora workflow approve <id> --approver "Name"`
|
|
- `vapora workflow cancel <id> --reason "Reason"`
|
|
- `vapora workflow templates`
|
|
|
|
5. **Kogral Integration**:
|
|
- `enrich_context_from_kogral()` - Loads guidelines, patterns, ADRs
|
|
- Filesystem-based knowledge retrieval from `.kogral/` directory
|
|
- Configurable via `KOGRAL_PATH` environment variable
|
|
|
|
### Integration with Existing Components
|
|
|
|
| Component | Usage |
|
|
|-----------|-------|
|
|
| SwarmCoordinator | Task assignment via `submit_task_for_bidding()` |
|
|
| AgentRegistry | 12 roles with lifecycle management |
|
|
| LearningProfiles | Expertise-based agent selection |
|
|
| KGPersistence | Workflow execution history |
|
|
| NATS JetStream | Inter-stage event coordination |
|
|
|
|
## Rationale
|
|
|
|
### Why Vapora Already Has the Pieces
|
|
|
|
Current Vapora implementation includes:
|
|
|
|
| Component | Status | Functionality |
|
|
|-----------|--------|---------------|
|
|
| SwarmCoordinator | Complete | Task assignment, load balancing |
|
|
| AgentRegistry | Complete | 12 roles, lifecycle management |
|
|
| Learning Profiles | Complete | Expertise scoring with recency bias |
|
|
| KG Persistence | Complete | SurrealDB, execution history |
|
|
| NATS Messaging | Complete | Inter-agent communication |
|
|
| Workflow Templates | Complete | `workflows.toml` with stage definitions |
|
|
| Artifact Types | Complete | `TaskCompleted.artifacts` field |
|
|
|
|
**What was missing**: The orchestration layer that executes workflow templates by loading templates, creating instances, listening for task completions, advancing stages, and passing artifacts.
|
|
|
|
### Why Not Alternative Solutions
|
|
|
|
| Alternative | Why Not |
|
|
|-------------|---------|
|
|
| Manual `/compact` in Claude Code | Requires user discipline, doesn't fundamentally change pattern |
|
|
| Shorter sessions manually | Loses context continuity, user must track state |
|
|
| External tools (LiteLLM, CrewAI) | Python-based, doesn't leverage existing Vapora infrastructure |
|
|
| Just use Haiku everywhere | Quality degradation for complex tasks |
|
|
|
|
Vapora already has budget-aware routing, learning profiles, and swarm coordination. The workflow orchestrator completes the picture.
|
|
|
|
### Why Kogral Integration
|
|
|
|
Kogral provides persistent knowledge that would otherwise bloat session context:
|
|
|
|
| Without Kogral | With Kogral |
|
|
|----------------|-------------|
|
|
| Guidelines re-explained each session | Query once via MCP, inject 5K tokens |
|
|
| ADRs repeated in conversation | Reference by ID, inject summary |
|
|
| Patterns described verbally | Structured retrieval, minimal tokens |
|
|
|
|
Kogral transforms "remember our auth pattern" (requires context) into "query pattern:auth" (stateless lookup).
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
1. **~95% reduction in cache token costs**: $840/month → ~$50-100/month for same workload
|
|
|
|
2. **Better model allocation**: Opus for architecture (high quality, few tokens), Haiku for implementation (lower quality acceptable, many tokens)
|
|
|
|
3. **Leverages existing investment**: Uses SwarmCoordinator, LearningProfiles, KGPersistence already built
|
|
|
|
4. **Audit trail**: Each agent execution persisted to KG with tokens, cost, duration
|
|
|
|
5. **Parallelization**: Multiple developers can work simultaneously on different parts
|
|
|
|
6. **Quality through specialization**: Each agent optimized for its role vs one generalist session
|
|
|
|
### Negative
|
|
|
|
1. **Orchestration overhead**: Additional component to maintain
|
|
|
|
2. **Latency between stages**: Artifact passing adds delay vs continuous conversation
|
|
|
|
3. **Context loss between agents**: Agent B doesn't know what Agent A "considered but rejected"
|
|
|
|
4. **Debugging complexity**: Issues span multiple agent executions
|
|
|
|
### Mitigations
|
|
|
|
| Negative | Mitigation |
|
|
|----------|------------|
|
|
| Orchestration overhead | Minimal code (~1500 lines), clear separation of concerns |
|
|
| Latency | Parallel stages where possible, async execution |
|
|
| Context loss | Kogral captures decisions, not just outcomes |
|
|
| Debugging | Workflow ID traces all related executions in KG |
|
|
|
|
## Metrics for Success
|
|
|
|
| Metric | Before | After (Target) |
|
|
|--------|--------|----------------|
|
|
| Monthly LLM cost | ~$840 | <$150 |
|
|
| Cache tokens per task | ~20M | <1M |
|
|
| Average context size | 500K+ | <50K per agent |
|
|
| Workflow completion rate | N/A | >95% |
|
|
|
|
## Cost Projection
|
|
|
|
Based on analyzed usage patterns with optimized workflow:
|
|
|
|
| Role | Model | % of Work | Monthly Cost |
|
|
|------|-------|-----------|--------------|
|
|
| Architect | Opus | 10% | ~$25 |
|
|
| Developer | Haiku | 50% | ~$30 |
|
|
| Reviewer | Sonnet | 25% | ~$40 |
|
|
| Tester | Haiku | 15% | ~$15 |
|
|
| **Total** | | | **~$110** |
|
|
|
|
**Savings: ~$730/month (87% reduction)**
|
|
|
|
## Implementation Status
|
|
|
|
- **Status**: Complete (v1.2.0)
|
|
- **Crates**: vapora-workflow-engine, vapora-cli
|
|
- **Tests**: 26 unit tests + 1 doc test passing
|
|
- **Endpoints**: 6 REST API endpoints
|
|
- **Templates**: 4 pre-configured workflows
|
|
- **CLI Commands**: 6 workflow management commands
|
|
|
|
## References
|
|
|
|
- Usage data: Claude Code usage analysis (5 weeks, 3.82B cache tokens)
|
|
- Vapora SwarmCoordinator: `crates/vapora-swarm/src/coordinator.rs`
|
|
- Vapora Workflows Config: `config/workflows.toml`
|
|
- Kogral MCP: `kogral-mcp` (external project)
|
|
- Implementation: `crates/vapora-workflow-engine/`
|
|
- CLI: `crates/vapora-cli/`
|
|
|
|
## Related ADRs
|
|
|
|
- ADR-0014: Learning-Based Agent Selection
|
|
- ADR-0015: Budget Enforcement & Cost Optimization
|
|
- ADR-0013: Knowledge Graph for Temporal Execution History
|
|
- ADR-0018: Swarm Load Balancing
|
|
|
|
## Decision Drivers
|
|
|
|
1. **Data-driven**: 95% of cost is cache tokens from long sessions
|
|
2. **Infrastructure exists**: Vapora has all pieces except orchestrator
|
|
3. **Kogral synergy**: Persistent knowledge reduces context requirements
|
|
4. **Measurable outcome**: Clear before/after metrics for validation
|
|
5. **Production-ready**: Complete implementation with tests and documentation
|