13 KiB
Workflow Orchestrator
Multi-stage workflow execution with cost-efficient agent coordination and artifact passing.
Overview
The Workflow Orchestrator (vapora-workflow-engine) enables cost-efficient multi-agent pipelines by executing workflows as discrete stages with short-lived agent contexts. Instead of accumulating context in long sessions, agents receive only what they need, produce artifacts, and terminate.
Key Benefit: ~95% reduction in LLM cache token costs compared to monolithic session patterns.
Architecture
Core Components
┌─────────────────────────────────────────────────────────┐
│ WorkflowOrchestrator │
│ ┌─────────────────────────────────────────────────┐ │
│ │ WorkflowInstance │ │
│ │ ├─ workflow_id: UUID │ │
│ │ ├─ template: WorkflowConfig │ │
│ │ ├─ current_stage: usize │ │
│ │ ├─ stage_states: Vec<StageState> │ │
│ │ └─ artifacts: HashMap<String, Artifact> │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ NATS │ │ Swarm │ │ KG │
│ Listener │ │Coordinator│ │Persistence│
└──────────┘ └──────────┘ └──────────┘
Workflow Lifecycle
- Template Loading: Read workflow definition from
config/workflows.toml - Instance Creation: Create
WorkflowInstancewith initial context - Stage Execution: Orchestrator assigns tasks to agents via SwarmCoordinator
- Event Listening: NATS subscribers wait for
TaskCompleted/TaskFailedevents - Stage Advancement: When all tasks complete, advance to next stage
- Artifact Passing: Accumulated artifacts passed to subsequent stages
- Completion: Workflow marked complete, metrics recorded
Workflow Templates
Pre-configured workflows in config/workflows.toml:
feature_development (5 stages)
[[workflows]]
name = "feature_development"
trigger = "manual"
[[workflows.stages]]
name = "architecture_design"
agents = ["architect"]
parallel = false
approval_required = false
[[workflows.stages]]
name = "implementation"
agents = ["developer", "developer"]
parallel = true
max_parallel = 2
approval_required = false
[[workflows.stages]]
name = "testing"
agents = ["tester"]
parallel = false
approval_required = false
[[workflows.stages]]
name = "code_review"
agents = ["reviewer"]
parallel = false
approval_required = true
[[workflows.stages]]
name = "deployment"
agents = ["devops"]
parallel = false
approval_required = true
Stages: architecture → implementation (parallel) → testing → review (approval) → deployment (approval)
bugfix (4 stages)
Stages: investigation → fix → testing → deployment
documentation_update (3 stages)
Stages: content creation → review (approval) → publish
security_audit (4 stages)
Stages: code analysis → penetration testing → remediation → verification (approval)
Stage Types
Sequential Stages
Single agent executes task, advances when complete.
[[workflows.stages]]
name = "architecture_design"
agents = ["architect"]
parallel = false
Parallel Stages
Multiple agents execute tasks simultaneously.
[[workflows.stages]]
name = "implementation"
agents = ["developer", "developer"]
parallel = true
max_parallel = 2
Approval Gates
Stage requires manual approval before advancing.
[[workflows.stages]]
name = "deployment"
agents = ["devops"]
approval_required = true
When approval_required = true:
- Workflow pauses with status
waiting_approval:<stage_idx> - NATS event published to
vapora.workflow.approval_required - Admin approves via API or CLI
- Workflow resumes execution
Artifacts
Data passed between stages:
Artifact Types
pub enum ArtifactType {
Adr, // Architecture Decision Record
Code, // Source code files
TestResults, // Test execution output
Review, // Code review feedback
Documentation, // Generated docs
Custom(String), // User-defined type
}
Artifact Flow
Stage 1: Architecture
└─ Produces: Artifact(Adr, "design-spec", ...)
│
▼
Stage 2: Implementation
├─ Consumes: design-spec
└─ Produces: Artifact(Code, "feature-impl", ...)
│
▼
Stage 3: Testing
├─ Consumes: feature-impl
└─ Produces: Artifact(TestResults, "test-report", ...)
Artifacts stored in WorkflowInstance.accumulated_artifacts and passed to subsequent stages via context.
Kogral Integration
Enrich workflow context with persistent knowledge from Kogral:
orchestrator.enrich_context_from_kogral(&mut context, "feature_development").await?;
Loads:
- Guidelines:
.kogral/guidelines/{workflow_name}.md - Patterns:
.kogral/patterns/*.md(matching workflow name) - ADRs:
.kogral/adrs/*.md(5 most recent, containing workflow name)
Result injected into context:
{
"task": "Add authentication",
"kogral_guidelines": {
"source": ".kogral/guidelines/feature_development.md",
"content": "..."
},
"kogral_patterns": [
{ "file": "auth-pattern.md", "content": "..." }
],
"kogral_decisions": [
{ "file": "0005-oauth2-implementation.md", "content": "..." }
]
}
Configuration:
export KOGRAL_PATH="/path/to/kogral/.kogral"
Default: ../kogral/.kogral (sibling directory)
REST API
All endpoints under /api/v1/workflow_orchestrator:
Start Workflow
POST /api/v1/workflow_orchestrator
Content-Type: application/json
{
"template": "feature_development",
"context": {
"task": "Implement authentication",
"requirements": ["OAuth2", "JWT"]
}
}
Response:
{
"workflow_id": "3f9a2b1c-5e7f-4a9b-8c2d-1e3f5a7b9c1d"
}
List Active Workflows
GET /api/v1/workflow_orchestrator
Response:
{
"workflows": [
{
"id": "3f9a2b1c-5e7f-4a9b-8c2d-1e3f5a7b9c1d",
"template_name": "feature_development",
"status": "running",
"current_stage": 2,
"total_stages": 5,
"created_at": "2026-01-24T01:23:45.123Z",
"updated_at": "2026-01-24T01:45:12.456Z"
}
]
}
Get Workflow Status
GET /api/v1/workflow_orchestrator/:id
Response: Same as workflow object in list response
Approve Stage
POST /api/v1/workflow_orchestrator/:id/approve
Content-Type: application/json
{
"approver": "Jane Doe"
}
Response:
{
"success": true,
"message": "Workflow 3f9a2b1c stage approved"
}
Cancel Workflow
POST /api/v1/workflow_orchestrator/:id/cancel
Content-Type: application/json
{
"reason": "Requirements changed"
}
Response:
{
"success": true,
"message": "Workflow 3f9a2b1c cancelled"
}
List Templates
GET /api/v1/workflow_orchestrator/templates
Response:
{
"templates": [
"feature_development",
"bugfix",
"documentation_update",
"security_audit"
]
}
NATS Events
Workflow orchestrator publishes/subscribes to NATS JetStream:
Subscriptions
vapora.tasks.completed- Agent task completion eventsvapora.tasks.failed- Agent task failure events
Publications
vapora.workflow.approval_required- Stage waiting for approvalvapora.workflow.completed- Workflow finished successfully
Event Format:
{
"type": "approval_required",
"workflow_id": "3f9a2b1c-5e7f-4a9b-8c2d-1e3f5a7b9c1d",
"stage": "code_review",
"timestamp": "2026-01-24T01:45:12.456Z"
}
Metrics
Prometheus metrics exposed at /metrics:
vapora_workflows_started_total- Total workflows initiatedvapora_workflows_completed_total- Successfully finished workflowsvapora_workflows_failed_total- Failed workflowsvapora_stages_completed_total- Individual stage completionsvapora_active_workflows- Currently running workflows (gauge)vapora_stage_duration_seconds- Histogram of stage execution timesvapora_workflow_duration_seconds- Histogram of total workflow times
Cost Optimization
Before: Monolithic Session
Session with 50 messages:
├─ Message 1: 50K context → 50K cache reads
├─ Message 2: 100K context → 100K cache reads
├─ Message 3: 150K context → 150K cache reads
└─ Message 50: 800K context → 800K cache reads
──────────────────
~20M cache reads
Cost: ~$840/month for typical usage
After: Multi-Stage Workflow
Workflow with 3 stages:
├─ Architect: 40K context, 5 msgs → 200K cache reads
├─ Developer: 25K context, 12 msgs → 300K cache reads
└─ Reviewer: 35K context, 4 msgs → 140K cache reads
──────────────────
~640K cache reads
Cost: ~$110/month for equivalent work
Savings: ~$730/month (87% reduction)
Usage Examples
See CLI Commands Guide for command-line usage.
Programmatic Usage
use vapora_workflow_engine::WorkflowOrchestrator;
use std::sync::Arc;
// Initialize orchestrator
let orchestrator = Arc::new(
WorkflowOrchestrator::new(
"config/workflows.toml",
swarm,
kg,
nats,
).await?
);
// Start event listener
orchestrator.clone().start_event_listener().await?;
// Start workflow
let workflow_id = orchestrator.start_workflow(
"feature_development",
serde_json::json!({
"task": "Add authentication",
"requirements": ["OAuth2", "JWT"]
})
).await?;
// Get status
let workflow = orchestrator.get_workflow(&workflow_id)?;
println!("Status: {:?}", workflow.status);
// Approve stage (if waiting)
orchestrator.approve_stage(&workflow_id, "Jane Doe").await?;
Configuration
Workflow Templates
File: config/workflows.toml
[engine]
max_parallel_tasks = 10
workflow_timeout = 3600
approval_gates_enabled = true
[[workflows]]
name = "custom_workflow"
trigger = "manual"
[[workflows.stages]]
name = "stage_name"
agents = ["agent_role"]
parallel = false
max_parallel = 1
approval_required = false
Environment Variables
# Kogral knowledge base path
export KOGRAL_PATH="/path/to/kogral/.kogral"
# NATS connection
export NATS_URL="nats://localhost:4222"
# Backend API (for CLI)
export VAPORA_API_URL="http://localhost:8001"
Troubleshooting
Workflow Stuck in "waiting_approval"
Solution: Use CLI or API to approve:
vapora workflow approve <workflow_id> --approver "Your Name"
Stage Fails Repeatedly
Check:
- Agent availability:
vapora workflow list(via backend) - NATS connection: Verify NATS URL and cluster status
- Task requirements: Check if stage agents have required capabilities
High Latency Between Stages
Causes:
- NATS messaging delay (check network)
- SwarmCoordinator queue depth (check agent load)
- Artifact serialization overhead (reduce artifact size)
Mitigation:
- Use parallel stages where possible
- Increase
max_parallelin stage config - Optimize artifact content (references instead of full content)
Workflow Not Advancing
Debug:
# Check workflow status
vapora workflow status <workflow_id>
# Check backend logs
docker logs vapora-backend
# Check NATS messages
nats sub "vapora.tasks.>"
Related Documentation
- CLI Commands Guide - Command-line usage
- Multi-Agent Workflows - Architecture overview
- Agent Registry & Coordination - Agent management
- ADR-0028: Workflow Orchestrator - Decision rationale
- ADR-0014: Learning-Based Agent Selection - Agent selection
- ADR-0015: Budget Enforcement - Cost control