jesus/Vapora

Fork 0

Jesús Pérez cc55b97678

Documentation Lint & Validation / Markdown Linting (push) Has been cancelled

Details

Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled

Details

Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled

Details

Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled

Details

mdBook Build & Deploy / Build mdBook (push) Has been cancelled

Details

mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled

Details

mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled

Details

mdBook Build & Deploy / Notification (push) Has been cancelled

Details

Rust CI / Security Audit (push) Has been cancelled

Details

Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled

Details

Rust CI / Check + Test + Lint (stable) (push) Has been cancelled

Details

Nickel Type Check / Nickel Type Checking (push) Has been cancelled

Details

chore: update README and CHANGELOG with workflow orchestrator features

2026-01-24 02:07:45 +00:00

11 KiB

Raw Blame History

ADR-0028: Workflow Orchestrator for Cost-Efficient Multi-Agent Pipelines

Status

Accepted - Implemented in v1.2.0

Context

The Problem: Excessive LLM Costs from Cache Token Accumulation

Analysis of real Claude Code usage data (5 weeks, individual developer) reveals a critical cost pattern:

Metric	Value
Total cost	$1,050.68
Weekly average	~$210
Monthly projection	~$840
Cache read tokens	3.82B (95.7% of total)
Cache creation tokens	170M (4.3%)
Direct input tokens	2.4M (0.06%)
Direct output tokens	366K (0.009%)

The cost is dominated by cache tokens, not generation.

Root Cause: Monolithic Session Pattern

Current workflow with Claude Code follows a monolithic session pattern:

Session start
├─ Message 1:  context 50K   → cache read 50K
├─ Message 2:  context 100K  → cache read 100K
├─ Message 3:  context 150K  → cache read 150K
├─ ...
└─ Message 50: context 800K  → cache read 800K
                             ─────────────────
                             ~20M cache reads per session

Each message in a long session re-sends the entire conversation history. Over a typical development session (50+ messages), context accumulates to 500K-1M tokens, with each subsequent message re-transmitting all previous context.

Why This Matters

At current pricing (2026 rates):

Cache read (Haiku): $0.03/1M tokens
Cache read (Sonnet): $0.30/1M tokens
Cache read (Opus): $1.50/1M tokens

With 3.82B cache read tokens distributed across Sonnet (51%) and Haiku (38%), the cache cost alone exceeds what direct input/output would cost.

Decision

Implement a Workflow Orchestrator (vapora-workflow-engine) that executes multi-stage pipelines with short-lived agent contexts.

Architecture: Agents with Short Lifecycles

Instead of one long session accumulating context, workflows execute as discrete stages:

┌─────────────────────────────────────────────────────────┐
│ Task: "Implement feature X"                              │
└─────────────────────────────────────────────────────────┘
                         │
    ┌────────────────────┼────────────────────┐
    ▼                    ▼                    ▼
┌─────────┐        ┌──────────┐        ┌──────────┐
│Architect│        │Developer │        │ Reviewer │
│ (Opus)  │        │ (Haiku)  │        │ (Sonnet) │
├─────────┤        ├──────────┤        ├──────────┤
│Context: │        │Context:  │        │Context:  │
│ 40K     │───────▶│ 25K      │───────▶│ 35K      │
│ 5 msgs  │ spec   │ 12 msgs  │ code   │ 4 msgs   │
│ 200K    │        │ 300K     │        │ 140K     │
│ cache   │        │ cache    │        │ cache    │
└────┬────┘        └────┬─────┘        └────┬─────┘
     │                  │                   │
     ▼                  ▼                   ▼
  TERMINATES         TERMINATES          TERMINATES
  (context           (context            (context
   discarded)         discarded)          discarded)

Total cache: ~640K
Monolithic equivalent: ~20-40M
Reduction: 95-97%

Key Principles

Context isolation: Each agent receives only what it needs (spec, relevant files), not full conversation history
Artifact passing, not conversation passing: Between agents flows the result (spec, code, review), not the dialogue that produced it
Short lifecycles: Agent completes task → context dies → next agent starts fresh
Persistent memory via Kogral: Important decisions/patterns stored in knowledge base, not in session context

Implementation

Components

vapora-workflow-engine (new crate):
- WorkflowOrchestrator: Main coordinator managing workflow lifecycle
- WorkflowInstance: State machine tracking individual workflow execution
- StageState: Manages stage execution and task assignment
- Artifact: Data passed between stages (ADR, Code, TestResults, Review, Documentation)
Workflow Templates (config/workflows.toml):
- feature_development (5 stages): architecture → implementation → testing → review → deployment
- bugfix (4 stages): investigation → fix → testing → deployment
- documentation_update (3 stages): content → review → publish
- security_audit (4 stages): analysis → pentesting → remediation → verification
REST API (/api/v1/workflow_orchestrator):
- POST / - Start workflow
- GET / - List active workflows
- GET /:id - Get workflow status
- POST /:id/approve - Approve waiting stage
- POST /:id/cancel - Cancel running workflow
- GET /templates - List available templates
CLI (vapora-cli):
- vapora workflow start --template <name> --context context.json
- vapora workflow list
- vapora workflow status <id>
- vapora workflow approve <id> --approver "Name"
- vapora workflow cancel <id> --reason "Reason"
- vapora workflow templates
Kogral Integration:
- enrich_context_from_kogral() - Loads guidelines, patterns, ADRs
- Filesystem-based knowledge retrieval from .kogral/ directory
- Configurable via KOGRAL_PATH environment variable

Integration with Existing Components

Component	Usage
SwarmCoordinator	Task assignment via `submit_task_for_bidding()`
AgentRegistry	12 roles with lifecycle management
LearningProfiles	Expertise-based agent selection
KGPersistence	Workflow execution history
NATS JetStream	Inter-stage event coordination

Rationale

Why Vapora Already Has the Pieces

Current Vapora implementation includes:

Component	Status	Functionality
SwarmCoordinator	Complete	Task assignment, load balancing
AgentRegistry	Complete	12 roles, lifecycle management
Learning Profiles	Complete	Expertise scoring with recency bias
KG Persistence	Complete	SurrealDB, execution history
NATS Messaging	Complete	Inter-agent communication
Workflow Templates	Complete	`workflows.toml` with stage definitions
Artifact Types	Complete	`TaskCompleted.artifacts` field

What was missing: The orchestration layer that executes workflow templates by loading templates, creating instances, listening for task completions, advancing stages, and passing artifacts.

Why Not Alternative Solutions

Alternative	Why Not
Manual `/compact` in Claude Code	Requires user discipline, doesn't fundamentally change pattern
Shorter sessions manually	Loses context continuity, user must track state
External tools (LiteLLM, CrewAI)	Python-based, doesn't leverage existing Vapora infrastructure
Just use Haiku everywhere	Quality degradation for complex tasks

Vapora already has budget-aware routing, learning profiles, and swarm coordination. The workflow orchestrator completes the picture.

Why Kogral Integration

Kogral provides persistent knowledge that would otherwise bloat session context:

Without Kogral	With Kogral
Guidelines re-explained each session	Query once via MCP, inject 5K tokens
ADRs repeated in conversation	Reference by ID, inject summary
Patterns described verbally	Structured retrieval, minimal tokens

Kogral transforms "remember our auth pattern" (requires context) into "query pattern:auth" (stateless lookup).

Consequences

Positive

~95% reduction in cache token costs: $840/month → ~$50-100/month for same workload
Better model allocation: Opus for architecture (high quality, few tokens), Haiku for implementation (lower quality acceptable, many tokens)
Leverages existing investment: Uses SwarmCoordinator, LearningProfiles, KGPersistence already built
Audit trail: Each agent execution persisted to KG with tokens, cost, duration
Parallelization: Multiple developers can work simultaneously on different parts
Quality through specialization: Each agent optimized for its role vs one generalist session

Negative

Orchestration overhead: Additional component to maintain
Latency between stages: Artifact passing adds delay vs continuous conversation
Context loss between agents: Agent B doesn't know what Agent A "considered but rejected"
Debugging complexity: Issues span multiple agent executions

Mitigations

Negative	Mitigation
Orchestration overhead	Minimal code (~1500 lines), clear separation of concerns
Latency	Parallel stages where possible, async execution
Context loss	Kogral captures decisions, not just outcomes
Debugging	Workflow ID traces all related executions in KG

Metrics for Success

Metric	Before	After (Target)
Monthly LLM cost	~$840	<$150
Cache tokens per task	~20M	<1M
Average context size	500K+	<50K per agent
Workflow completion rate	N/A	>95%

Cost Projection

Based on analyzed usage patterns with optimized workflow:

Role	Model	% of Work	Monthly Cost
Architect	Opus	10%	~$25
Developer	Haiku	50%	~$30
Reviewer	Sonnet	25%	~$40
Tester	Haiku	15%	~$15
Total			~$110

Savings: ~$730/month (87% reduction)

Implementation Status

Status: Complete (v1.2.0)
Crates: vapora-workflow-engine, vapora-cli
Tests: 26 unit tests + 1 doc test passing
Endpoints: 6 REST API endpoints
Templates: 4 pre-configured workflows
CLI Commands: 6 workflow management commands

References

Usage data: Claude Code usage analysis (5 weeks, 3.82B cache tokens)
Vapora SwarmCoordinator: crates/vapora-swarm/src/coordinator.rs
Vapora Workflows Config: config/workflows.toml
Kogral MCP: kogral-mcp (external project)
Implementation: crates/vapora-workflow-engine/
CLI: crates/vapora-cli/

ADR-0014: Learning-Based Agent Selection
ADR-0015: Budget Enforcement & Cost Optimization
ADR-0013: Knowledge Graph for Temporal Execution History
ADR-0018: Swarm Load Balancing

Decision Drivers

Data-driven: 95% of cost is cache tokens from long sessions
Infrastructure exists: Vapora has all pieces except orchestrator
Kogral synergy: Persistent knowledge reduces context requirements
Measurable outcome: Clear before/after metrics for validation
Production-ready: Complete implementation with tests and documentation

11 KiB Raw Blame History

ADR-0028: Workflow Orchestrator for Cost-Efficient Multi-Agent Pipelines

Status

Context

The Problem: Excessive LLM Costs from Cache Token Accumulation

Root Cause: Monolithic Session Pattern

Why This Matters

Decision

Architecture: Agents with Short Lifecycles

Key Principles

Implementation

Components

Integration with Existing Components

Rationale

Why Vapora Already Has the Pieces

Why Not Alternative Solutions

Why Kogral Integration

Consequences

Positive

Negative

Mitigations

Metrics for Success

Cost Projection

Implementation Status

References

Related ADRs

Decision Drivers

11 KiB

Raw Blame History