Vapora/docs/features/workflow-orchestrator.md
Jesús Pérez cc55b97678
Some checks failed
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Nickel Type Check / Nickel Type Checking (push) Has been cancelled
chore: update README and CHANGELOG with workflow orchestrator features
2026-01-24 02:07:45 +00:00

539 lines
13 KiB
Markdown

# Workflow Orchestrator
Multi-stage workflow execution with cost-efficient agent coordination and artifact passing.
## Overview
The Workflow Orchestrator (`vapora-workflow-engine`) enables cost-efficient multi-agent pipelines by executing workflows as discrete stages with short-lived agent contexts. Instead of accumulating context in long sessions, agents receive only what they need, produce artifacts, and terminate.
**Key Benefit**: ~95% reduction in LLM cache token costs compared to monolithic session patterns.
## Architecture
### Core Components
```text
┌─────────────────────────────────────────────────────────┐
│ WorkflowOrchestrator │
│ ┌─────────────────────────────────────────────────┐ │
│ │ WorkflowInstance │ │
│ │ ├─ workflow_id: UUID │ │
│ │ ├─ template: WorkflowConfig │ │
│ │ ├─ current_stage: usize │ │
│ │ ├─ stage_states: Vec<StageState> │ │
│ │ └─ artifacts: HashMap<String, Artifact> │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ NATS │ │ Swarm │ │ KG │
│ Listener │ │Coordinator│ │Persistence│
└──────────┘ └──────────┘ └──────────┘
```
### Workflow Lifecycle
1. **Template Loading**: Read workflow definition from `config/workflows.toml`
2. **Instance Creation**: Create `WorkflowInstance` with initial context
3. **Stage Execution**: Orchestrator assigns tasks to agents via SwarmCoordinator
4. **Event Listening**: NATS subscribers wait for `TaskCompleted`/`TaskFailed` events
5. **Stage Advancement**: When all tasks complete, advance to next stage
6. **Artifact Passing**: Accumulated artifacts passed to subsequent stages
7. **Completion**: Workflow marked complete, metrics recorded
## Workflow Templates
Pre-configured workflows in `config/workflows.toml`:
### feature_development (5 stages)
```toml
[[workflows]]
name = "feature_development"
trigger = "manual"
[[workflows.stages]]
name = "architecture_design"
agents = ["architect"]
parallel = false
approval_required = false
[[workflows.stages]]
name = "implementation"
agents = ["developer", "developer"]
parallel = true
max_parallel = 2
approval_required = false
[[workflows.stages]]
name = "testing"
agents = ["tester"]
parallel = false
approval_required = false
[[workflows.stages]]
name = "code_review"
agents = ["reviewer"]
parallel = false
approval_required = true
[[workflows.stages]]
name = "deployment"
agents = ["devops"]
parallel = false
approval_required = true
```
**Stages**: architecture → implementation (parallel) → testing → review (approval) → deployment (approval)
### bugfix (4 stages)
**Stages**: investigation → fix → testing → deployment
### documentation_update (3 stages)
**Stages**: content creation → review (approval) → publish
### security_audit (4 stages)
**Stages**: code analysis → penetration testing → remediation → verification (approval)
## Stage Types
### Sequential Stages
Single agent executes task, advances when complete.
```toml
[[workflows.stages]]
name = "architecture_design"
agents = ["architect"]
parallel = false
```
### Parallel Stages
Multiple agents execute tasks simultaneously.
```toml
[[workflows.stages]]
name = "implementation"
agents = ["developer", "developer"]
parallel = true
max_parallel = 2
```
### Approval Gates
Stage requires manual approval before advancing.
```toml
[[workflows.stages]]
name = "deployment"
agents = ["devops"]
approval_required = true
```
When `approval_required = true`:
1. Workflow pauses with status `waiting_approval:<stage_idx>`
2. NATS event published to `vapora.workflow.approval_required`
3. Admin approves via API or CLI
4. Workflow resumes execution
## Artifacts
Data passed between stages:
### Artifact Types
```rust
pub enum ArtifactType {
Adr, // Architecture Decision Record
Code, // Source code files
TestResults, // Test execution output
Review, // Code review feedback
Documentation, // Generated docs
Custom(String), // User-defined type
}
```
### Artifact Flow
```text
Stage 1: Architecture
└─ Produces: Artifact(Adr, "design-spec", ...)
Stage 2: Implementation
├─ Consumes: design-spec
└─ Produces: Artifact(Code, "feature-impl", ...)
Stage 3: Testing
├─ Consumes: feature-impl
└─ Produces: Artifact(TestResults, "test-report", ...)
```
Artifacts stored in `WorkflowInstance.accumulated_artifacts` and passed to subsequent stages via context.
## Kogral Integration
Enrich workflow context with persistent knowledge from Kogral:
```rust
orchestrator.enrich_context_from_kogral(&mut context, "feature_development").await?;
```
Loads:
- **Guidelines**: `.kogral/guidelines/{workflow_name}.md`
- **Patterns**: `.kogral/patterns/*.md` (matching workflow name)
- **ADRs**: `.kogral/adrs/*.md` (5 most recent, containing workflow name)
Result injected into context:
```json
{
"task": "Add authentication",
"kogral_guidelines": {
"source": ".kogral/guidelines/feature_development.md",
"content": "..."
},
"kogral_patterns": [
{ "file": "auth-pattern.md", "content": "..." }
],
"kogral_decisions": [
{ "file": "0005-oauth2-implementation.md", "content": "..." }
]
}
```
**Configuration**:
```bash
export KOGRAL_PATH="/path/to/kogral/.kogral"
```
Default: `../kogral/.kogral` (sibling directory)
## REST API
All endpoints under `/api/v1/workflow_orchestrator`:
### Start Workflow
```http
POST /api/v1/workflow_orchestrator
Content-Type: application/json
{
"template": "feature_development",
"context": {
"task": "Implement authentication",
"requirements": ["OAuth2", "JWT"]
}
}
```
**Response**:
```json
{
"workflow_id": "3f9a2b1c-5e7f-4a9b-8c2d-1e3f5a7b9c1d"
}
```
### List Active Workflows
```http
GET /api/v1/workflow_orchestrator
```
**Response**:
```json
{
"workflows": [
{
"id": "3f9a2b1c-5e7f-4a9b-8c2d-1e3f5a7b9c1d",
"template_name": "feature_development",
"status": "running",
"current_stage": 2,
"total_stages": 5,
"created_at": "2026-01-24T01:23:45.123Z",
"updated_at": "2026-01-24T01:45:12.456Z"
}
]
}
```
### Get Workflow Status
```http
GET /api/v1/workflow_orchestrator/:id
```
**Response**: Same as workflow object in list response
### Approve Stage
```http
POST /api/v1/workflow_orchestrator/:id/approve
Content-Type: application/json
{
"approver": "Jane Doe"
}
```
**Response**:
```json
{
"success": true,
"message": "Workflow 3f9a2b1c stage approved"
}
```
### Cancel Workflow
```http
POST /api/v1/workflow_orchestrator/:id/cancel
Content-Type: application/json
{
"reason": "Requirements changed"
}
```
**Response**:
```json
{
"success": true,
"message": "Workflow 3f9a2b1c cancelled"
}
```
### List Templates
```http
GET /api/v1/workflow_orchestrator/templates
```
**Response**:
```json
{
"templates": [
"feature_development",
"bugfix",
"documentation_update",
"security_audit"
]
}
```
## NATS Events
Workflow orchestrator publishes/subscribes to NATS JetStream:
### Subscriptions
- `vapora.tasks.completed` - Agent task completion events
- `vapora.tasks.failed` - Agent task failure events
### Publications
- `vapora.workflow.approval_required` - Stage waiting for approval
- `vapora.workflow.completed` - Workflow finished successfully
**Event Format**:
```json
{
"type": "approval_required",
"workflow_id": "3f9a2b1c-5e7f-4a9b-8c2d-1e3f5a7b9c1d",
"stage": "code_review",
"timestamp": "2026-01-24T01:45:12.456Z"
}
```
## Metrics
Prometheus metrics exposed at `/metrics`:
- `vapora_workflows_started_total` - Total workflows initiated
- `vapora_workflows_completed_total` - Successfully finished workflows
- `vapora_workflows_failed_total` - Failed workflows
- `vapora_stages_completed_total` - Individual stage completions
- `vapora_active_workflows` - Currently running workflows (gauge)
- `vapora_stage_duration_seconds` - Histogram of stage execution times
- `vapora_workflow_duration_seconds` - Histogram of total workflow times
## Cost Optimization
### Before: Monolithic Session
```text
Session with 50 messages:
├─ Message 1: 50K context → 50K cache reads
├─ Message 2: 100K context → 100K cache reads
├─ Message 3: 150K context → 150K cache reads
└─ Message 50: 800K context → 800K cache reads
──────────────────
~20M cache reads
```
**Cost**: ~$840/month for typical usage
### After: Multi-Stage Workflow
```text
Workflow with 3 stages:
├─ Architect: 40K context, 5 msgs → 200K cache reads
├─ Developer: 25K context, 12 msgs → 300K cache reads
└─ Reviewer: 35K context, 4 msgs → 140K cache reads
──────────────────
~640K cache reads
```
**Cost**: ~$110/month for equivalent work
**Savings**: ~$730/month (87% reduction)
## Usage Examples
See [CLI Commands Guide](../setup/cli-commands.md) for command-line usage.
### Programmatic Usage
```rust
use vapora_workflow_engine::WorkflowOrchestrator;
use std::sync::Arc;
// Initialize orchestrator
let orchestrator = Arc::new(
WorkflowOrchestrator::new(
"config/workflows.toml",
swarm,
kg,
nats,
).await?
);
// Start event listener
orchestrator.clone().start_event_listener().await?;
// Start workflow
let workflow_id = orchestrator.start_workflow(
"feature_development",
serde_json::json!({
"task": "Add authentication",
"requirements": ["OAuth2", "JWT"]
})
).await?;
// Get status
let workflow = orchestrator.get_workflow(&workflow_id)?;
println!("Status: {:?}", workflow.status);
// Approve stage (if waiting)
orchestrator.approve_stage(&workflow_id, "Jane Doe").await?;
```
## Configuration
### Workflow Templates
File: `config/workflows.toml`
```toml
[engine]
max_parallel_tasks = 10
workflow_timeout = 3600
approval_gates_enabled = true
[[workflows]]
name = "custom_workflow"
trigger = "manual"
[[workflows.stages]]
name = "stage_name"
agents = ["agent_role"]
parallel = false
max_parallel = 1
approval_required = false
```
### Environment Variables
```bash
# Kogral knowledge base path
export KOGRAL_PATH="/path/to/kogral/.kogral"
# NATS connection
export NATS_URL="nats://localhost:4222"
# Backend API (for CLI)
export VAPORA_API_URL="http://localhost:8001"
```
## Troubleshooting
### Workflow Stuck in "waiting_approval"
**Solution**: Use CLI or API to approve:
```bash
vapora workflow approve <workflow_id> --approver "Your Name"
```
### Stage Fails Repeatedly
**Check**:
1. Agent availability: `vapora workflow list` (via backend)
2. NATS connection: Verify NATS URL and cluster status
3. Task requirements: Check if stage agents have required capabilities
### High Latency Between Stages
**Causes**:
- NATS messaging delay (check network)
- SwarmCoordinator queue depth (check agent load)
- Artifact serialization overhead (reduce artifact size)
**Mitigation**:
- Use parallel stages where possible
- Increase `max_parallel` in stage config
- Optimize artifact content (references instead of full content)
### Workflow Not Advancing
**Debug**:
```bash
# Check workflow status
vapora workflow status <workflow_id>
# Check backend logs
docker logs vapora-backend
# Check NATS messages
nats sub "vapora.tasks.>"
```
## Related Documentation
- [CLI Commands Guide](../setup/cli-commands.md) - Command-line usage
- [Multi-Agent Workflows](../architecture/multi-agent-workflows.md) - Architecture overview
- [Agent Registry & Coordination](../architecture/agent-registry-coordination.md) - Agent management
- [ADR-0028: Workflow Orchestrator](../adrs/0028-workflow-orchestrator.md) - Decision rationale
- [ADR-0014: Learning-Based Agent Selection](../adrs/0014-learning-profiles.md) - Agent selection
- [ADR-0015: Budget Enforcement](../adrs/0015-budget-enforcement.md) - Cost control