Vapora/docs/architecture/multi-agent-workflows.md
Jesús Pérez 75e5ebd9a2
Some checks failed
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
Nickel Type Check / Nickel Type Checking (push) Has been cancelled
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
chore: ontology sync + 4 NCL ADRs + landing page update
on+re:
  - core.ncl: 5 new Practice nodes (notification-channels,
    vapora-capabilities, agent-hot-reload-stable-identity,
    merkle-audit-trail, notification-channels) + 5 new edges;
    knowledge-graph-execution-history updated with HNSW+BM25+RRF
  - state.ncl: production-readiness blocker/catalyst updated (hot-reload
    complete, BudgetManager/LLMRouter still require restart);
    ontoref-integration catalyst updated (vapora-ontology/reflection
    crates, api-catalog.json, nickel contracts)

  ADRs (NCL):
  - adr-013: KG hybrid search — HNSW+BM25+RRF, rejected in-process scan
  - adr-014: capability packages — AgentDefinition→vapora-shared,
    DashMap shard-before-await constraint
  - adr-015: Merkle audit trail — SHA-256 hash chain, rejected HMAC
  - adr-016: agent hot-reload — stable_id=role, learning_profiles survive
    drain, BudgetManager excluded from reload scope

  landing page:
  - 2 new feature boxes: VCS-Agnostic Worktree (jj/git), Ontology Protocol
  - KG box: 20→28 tests, HNSW+BM25+RRF description
  - Agents box: 71→82 tests, hot-reload + stable_id
  - tech stack: Rust 21→23 crates, added jj, Radicle, ontoref badges
  - status badge: 620→691 tests
2026-04-07 21:06:48 +01:00

570 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!-- REVIEW: Status claims "Specification (VAPORA v1.0)" — verify against vapora-workflow-engine/src/ which is production-ready at v1.2.0+; workflow stage models may not match current WorkflowStage types -->
# 🔄 Multi-Agent Workflows
## End-to-End Parallel Task Orchestration
**Version**: 0.1.0
**Status**: Specification (VAPORA v1.0 - Workflows)
**Purpose**: Workflows where 10+ agents work in parallel, coordinated automatically
---
## 🎯 Objetivo
Orquestar workflows donde múltiples agentes trabajan **en paralelo** en diferentes aspectos de una tarea, sin intervención manual:
```
Feature Request
ProjectManager crea task
↓ (paralelo)
Architect diseña ────────┐
Developer implementa ────├─→ Reviewer revisa ──┐
Tester escribe tests ────┤ ├─→ DecisionMaker aprueba
Documenter prepara docs ─┤ ├─→ DevOps deploya
Security audita ────────┘ │
Marketer promociona
```
---
## 📋 Workflow: Feature Compleja End-to-End
### Fase 1: Planificación (Serial - Requiere aprobación)
**Agentes**: Architect, ProjectManager, DecisionMaker
**Timeline**: 1-2 horas
```yaml
Workflow: feature-auth-mfa
Status: planning
Created: 2025-11-09T10:00:00Z
Steps:
1_architect_designs:
agent: architect
input: feature_request, project_context
task_type: ArchitectureDesign
quality: Critical
estimated_duration: 45min
output:
- design_doc.md
- adr-001-mfa-strategy.md
- architecture_diagram.svg
2_pm_validates:
dependencies: [1_architect_designs]
agent: project-manager
task_type: GeneralQuery
input: design_doc, project_timeline
action: validate_feasibility
3_decision_maker_approves:
dependencies: [2_pm_validates]
agent: decision-maker
task_type: GeneralQuery
input: design, feasibility_report
approval_required: true
escalation_if: ["too risky", "breaks roadmap"]
```
**Output**: ADR aprobado, design doc, go/no-go decision
---
### Fase 2: Implementación (Paralelo - Máxima concurrencia)
**Agentes**: Developer (×3), Tester, Security, Documenter (async)
**Timeline**: 3-5 días
```yaml
4_frontend_dev:
dependencies: [3_decision_maker_approves]
agent: developer-frontend
skill_match: frontend
input: design_doc, api_spec
tasks:
- implement_mfa_ui
- add_totp_input
- add_webauthn_button
parallel_with: [4_backend_dev, 5_security_setup, 6_docs_start]
max_duration: 4days
4_backend_dev:
dependencies: [3_decision_maker_approves]
agent: developer-backend
skill_match: backend, security
input: design_doc, database_schema
tasks:
- implement_mfa_service
- add_totp_verification
- add_webauthn_endpoint
parallel_with: [4_frontend_dev, 5_security_setup, 6_docs_start]
max_duration: 4days
5_security_audit:
dependencies: [3_decision_maker_approves]
agent: security
input: design_doc, threat_model
tasks:
- threat_modeling
- security_review
- vulnerability_scan_plan
parallel_with: [4_frontend_dev, 4_backend_dev, 6_docs_start]
can_block_deployment: true
6_docs_start:
dependencies: [3_decision_maker_approves]
agent: documenter
input: design_doc
tasks:
- create_adr_doc
- start_implementation_guide
parallel_with: [4_frontend_dev, 4_backend_dev, 5_security_audit]
low_priority: true
Status: in_progress
Parallel_agents: 5
Progress: 60%
Blockers: none
```
**Output**:
- Frontend implementation + PRs
- Backend implementation + PRs
- Security audit report
- Initial documentation
---
### Fase 3: Código Review (Paralelo pero gated)
**Agentes**: CodeReviewer (×2), Security, Tester
**Timeline**: 1-2 días
```yaml
7a_frontend_review:
dependencies: [4_frontend_dev]
agent: code-reviewer-frontend
input: frontend_pr
actions: [comment, request_changes, approve]
must_pass: 1 # At least 1 reviewer
can_block_merge: true
7b_backend_review:
dependencies: [4_backend_dev]
agent: code-reviewer-backend
input: backend_pr
actions: [comment, request_changes, approve]
must_pass: 1
security_required: true # Security must also approve
7c_security_review:
dependencies: [4_backend_dev, 5_security_audit]
agent: security
input: backend_pr, security_audit
actions: [scan, approve_or_block]
critical_vulns_block_merge: true
high_vulns_require_mitigation: true
7d_test_coverage:
dependencies: [4_frontend_dev, 4_backend_dev]
agent: tester
input: frontend_pr, backend_pr
actions: [run_tests, check_coverage, benchmark]
must_pass: tests_passing && coverage > 85%
Status: in_progress
Parallel_reviewers: 4
Approved: frontend_review
Pending: backend_review (awaiting security_review)
Blockers: security_review
```
**Output**:
- Approved PRs (if all pass)
- Comments & requested changes
- Test coverage report
- Security clearance
---
### Fase 4: Merge & Deploy (Serial - Ordered)
**Agentes**: CodeReviewer, DevOps, Monitor
**Timeline**: 1-2 horas
```yaml
8_merge_to_dev:
dependencies: [7a_frontend_review, 7b_backend_review, 7c_security_review, 7d_test_coverage]
agent: code-reviewer
action: merge_to_dev
requires: all_approved
9_deploy_staging:
dependencies: [8_merge_to_dev]
agent: devops
environment: staging
actions: [trigger_ci, deploy_manifests, smoke_test]
automatic_after_merge: true
timeout: 30min
10_smoke_test:
dependencies: [9_deploy_staging]
agent: tester
test_type: smoke
environments: [staging]
must_pass: all
11_monitor_staging:
dependencies: [9_deploy_staging]
agent: monitor
duration: 1hour
metrics: [error_rate, latency, cpu, memory]
alert_if: error_rate > 1% or p99_latency > 500ms
Status: in_progress
Completed: 8_merge_to_dev
In_progress: 9_deploy_staging (20min elapsed)
Pending: 10_smoke_test, 11_monitor_staging
```
**Output**:
- Code merged to dev
- Deployed to staging
- Smoke tests pass
- Monitoring active
---
### Fase 5: Final Validation & Release
**Agentes**: DecisionMaker, DevOps, Marketer, Monitor
**Timeline**: 1-3 horas
```yaml
12_final_approval:
dependencies: [10_smoke_test, 11_monitor_staging]
agent: decision-maker
input: test_results, monitoring_report, security_clearance
action: approve_for_production
if_blocked: defer_to_next_week
13_deploy_production:
dependencies: [12_final_approval]
agent: devops
environment: production
deployment_strategy: blue_green # 0 downtime
actions: [deploy, health_check, traffic_switch]
rollback_on: any_error
14_monitor_production:
dependencies: [13_deploy_production]
agent: monitor
duration: 24hours
alert_thresholds: [error_rate > 0.5%, p99 > 300ms, cpu > 80%]
auto_rollback_if: critical_error
15_announce_release:
dependencies: [13_deploy_production] # Can start once deployed
agent: marketer
async: true
actions: [draft_blog_post, announce_on_twitter, create_demo_video]
16_update_docs:
dependencies: [13_deploy_production]
agent: documenter
async: true
actions: [update_changelog, publish_guide, update_roadmap]
Status: completed
Deployed: 2025-11-10T14:00:00Z
Monitoring: Active
Release_notes: docs/releases/v1.2.0.md
```
**Output**:
- Deployed to production
- 24h monitoring active
- Blog post + social media
- Docs updated
- Release notes published
---
## 🔄 Workflow State Machine
```
Created
Planning (serial, approval-gated)
├─ Architect designs
├─ PM validates
└─ DecisionMaker approves → GO / NO-GO
Implementation (parallel)
├─ Frontend dev
├─ Backend dev
├─ Security audit
├─ Tester setup
└─ Documenter start
Review (parallel but gated)
├─ Code review
├─ Security review
├─ Test execution
└─ Coverage check
Merge & Deploy (serial, ordered)
├─ Merge to dev
├─ Deploy staging
├─ Smoke test
└─ Monitor staging
Release (parallel async)
├─ Final approval
├─ Deploy production
├─ Monitor 24h
├─ Marketing announce
└─ Docs update
Completed / Rolled back
Transitions:
- Blocked → can escalate to DecisionMaker
- Failed → auto-rollback if production
- Waiting → timeout after N hours
```
---
## 🎯 Workflow DSL (YAML/TOML)
### Minimal Example
```yaml
workflow:
id: feature-auth
title: Implement MFA
agents:
architect:
role: Architect
parallel_with: [pm]
pm:
role: ProjectManager
depends_on: [architect]
developer:
role: Developer
depends_on: [pm]
parallelizable: true
approval_required_at: [architecture, deploy_production]
allow_concurrent_agents: 10
timeline_hours: 48
```
### Complex Example (Feature-complete)
```yaml
workflow:
id: feature-user-preferences
title: User Preferences System
created_at: 2025-11-09T10:00:00Z
phases:
phase_1_design:
duration_hours: 2
serial: true
steps:
- name: architect_designs
agent: architect
input: feature_spec
output: design_doc
- name: architect_creates_adr
agent: architect
depends_on: architect_designs
output: adr-017.md
- name: pm_reviews
agent: project-manager
depends_on: architect_creates_adr
approval_required: true
phase_2_implementation:
duration_hours: 48
parallel: true
max_concurrent_agents: 6
steps:
- name: frontend_dev
agent: developer
skill_match: frontend
depends_on: [architect_designs]
- name: backend_dev
agent: developer
skill_match: backend
depends_on: [architect_designs]
- name: db_migration
agent: devops
depends_on: [architect_designs]
- name: security_review
agent: security
depends_on: [architect_designs]
- name: docs_start
agent: documenter
depends_on: [architect_creates_adr]
priority: low
phase_3_review:
duration_hours: 16
gate: all_tests_pass && all_reviews_approved
steps:
- name: frontend_review
agent: code-reviewer
depends_on: frontend_dev
- name: backend_review
agent: code-reviewer
depends_on: backend_dev
- name: tests
agent: tester
depends_on: [frontend_dev, backend_dev]
- name: deploy_staging
agent: devops
depends_on: [frontend_review, backend_review, tests]
phase_4_release:
duration_hours: 4
steps:
- name: final_approval
agent: decision-maker
depends_on: phase_3_review
- name: deploy_production
agent: devops
depends_on: final_approval
strategy: blue_green
- name: announce
agent: marketer
depends_on: deploy_production
async: true
```
---
## 🔧 Runtime: Monitoring & Adjustment
### Dashboard (Real-Time)
```
Workflow: feature-auth-mfa
Status: in_progress (Phase 2/5)
Progress: 45%
Timeline: 2/4 days remaining
Active Agents (5/12):
├─ architect-001 🟢 Designing (80% done)
├─ developer-frontend-001 🟢 Implementing (60% done)
├─ developer-backend-001 🟢 Implementing (50% done)
├─ security-001 🟢 Auditing (70% done)
└─ documenter-001 🟡 Waiting for PR links
Pending Agents (4):
├─ code-reviewer-001 ⏳ Waiting for frontend_dev
├─ code-reviewer-002 ⏳ Waiting for backend_dev
├─ tester-001 ⏳ Waiting for dev completion
└─ devops-001 ⏳ Waiting for reviews
Blockers: none
Issues: none
Risks: none
Timeline Projection:
- Design: ✅ 2h (completed)
- Implementation: 3d (50% done, on track)
- Review: 1d (scheduled)
- Deploy: 4h (scheduled)
Total ETA: 4d (vs 5d planned, 1d early!)
```
### Workflow Adjustments
```rust
pub enum WorkflowAdjustment {
// Add more agents if progress slow
AddAgent { agent_role: AgentRole, count: u32 },
// Parallelize steps that were serial
Parallelize { step_ids: Vec<String> },
// Skip optional steps to save time
SkipOptionalSteps { step_ids: Vec<String> },
// Escalate blocker to DecisionMaker
EscalateBlocker { step_id: String },
// Pause workflow for manual review
Pause { reason: String },
// Cancel workflow if infeasible
Cancel { reason: String },
}
// Example: If timeline too tight, add agents
if projected_timeline > planned_timeline {
workflow.adjust(WorkflowAdjustment::AddAgent {
agent_role: AgentRole::Developer,
count: 2,
}).await?;
}
```
---
## 🎯 Implementation Checklist
- [ ] Workflow YAML/TOML parser
- [ ] State machine executor (Created→Completed)
- [ ] Parallel task scheduler
- [ ] Dependency resolution (topological sort)
- [ ] Gate evaluation (all_passed, any_approved, etc.)
- [ ] Blocking & escalation logic
- [ ] Rollback on failure
- [ ] Real-time dashboard
- [ ] Audit trail (who did what, when, why)
- [ ] CLI: `vapora workflow run feature-auth.yaml`
- [ ] CLI: `vapora workflow status --id feature-auth`
- [ ] Monitoring & alerting
---
## 📊 Success Metrics
✅ 10+ agents coordinated without errors
✅ Parallel execution actual (not serial)
✅ Dependencies respected
✅ Approval gates enforce correctly
✅ Rollback works on failure
✅ Dashboard updates real-time
✅ Workflow completes in <5% over estimated time
---
**Version**: 0.1.0
**Status**: Specification Complete (VAPORA v1.0)
**Purpose**: Multi-agent parallel workflow orchestration