Vapora/docs/architecture/multi-agent-workflows.md

<!-- REVIEW: Status claims "Specification (VAPORA v1.0)" — verify against vapora-workflow-engine/src/ which is production-ready at v1.2.0+; workflow stage models may not match current WorkflowStage types -->
# 🔄 Multi-Agent Workflows
## End-to-End Parallel Task Orchestration

**Version**: 0.1.0
**Status**: Specification (VAPORA v1.0 - Workflows)
**Purpose**: Workflows where 10+ agents work in parallel, coordinated automatically

---

## 🎯 Objetivo

Orquestar workflows donde múltiples agentes trabajan **en paralelo** en diferentes aspectos de una tarea, sin intervención manual:

```
Feature Request
    ↓
ProjectManager crea task
    ↓ (paralelo)
Architect diseña ────────┐
Developer implementa ────├─→ Reviewer revisa ──┐
Tester escribe tests ────┤                     ├─→ DecisionMaker aprueba
Documenter prepara docs ─┤                     ├─→ DevOps deploya
Security audita ────────┘                      │
                                               ↓
                                         Marketer promociona
```

---

## 📋 Workflow: Feature Compleja End-to-End

### Fase 1: Planificación (Serial - Requiere aprobación)

**Agentes**: Architect, ProjectManager, DecisionMaker

**Timeline**: 1-2 horas

```yaml
Workflow: feature-auth-mfa
Status: planning
Created: 2025-11-09T10:00:00Z

Steps:
  1_architect_designs:
    agent: architect
    input: feature_request, project_context
    task_type: ArchitectureDesign
    quality: Critical
    estimated_duration: 45min
    output:
      - design_doc.md
      - adr-001-mfa-strategy.md
      - architecture_diagram.svg

  2_pm_validates:
    dependencies: [1_architect_designs]
    agent: project-manager
    task_type: GeneralQuery
    input: design_doc, project_timeline
    action: validate_feasibility

  3_decision_maker_approves:
    dependencies: [2_pm_validates]
    agent: decision-maker
    task_type: GeneralQuery
    input: design, feasibility_report
    approval_required: true
    escalation_if: ["too risky", "breaks roadmap"]
```

**Output**: ADR aprobado, design doc, go/no-go decision

---

### Fase 2: Implementación (Paralelo - Máxima concurrencia)

**Agentes**: Developer (×3), Tester, Security, Documenter (async)

**Timeline**: 3-5 días

```yaml
  4_frontend_dev:
    dependencies: [3_decision_maker_approves]
    agent: developer-frontend
    skill_match: frontend
    input: design_doc, api_spec
    tasks:
      - implement_mfa_ui
      - add_totp_input
      - add_webauthn_button
    parallel_with: [4_backend_dev, 5_security_setup, 6_docs_start]
    max_duration: 4days

  4_backend_dev:
    dependencies: [3_decision_maker_approves]
    agent: developer-backend
    skill_match: backend, security
    input: design_doc, database_schema
    tasks:
      - implement_mfa_service
      - add_totp_verification
      - add_webauthn_endpoint
    parallel_with: [4_frontend_dev, 5_security_setup, 6_docs_start]
    max_duration: 4days

  5_security_audit:
    dependencies: [3_decision_maker_approves]
    agent: security
    input: design_doc, threat_model
    tasks:
      - threat_modeling
      - security_review
      - vulnerability_scan_plan
    parallel_with: [4_frontend_dev, 4_backend_dev, 6_docs_start]
    can_block_deployment: true

  6_docs_start:
    dependencies: [3_decision_maker_approves]
    agent: documenter
    input: design_doc
    tasks:
      - create_adr_doc
      - start_implementation_guide
    parallel_with: [4_frontend_dev, 4_backend_dev, 5_security_audit]
    low_priority: true

Status: in_progress
Parallel_agents: 5
Progress: 60%
Blockers: none
```

**Output**:
- Frontend implementation + PRs
- Backend implementation + PRs
- Security audit report
- Initial documentation

---

### Fase 3: Código Review (Paralelo pero gated)

**Agentes**: CodeReviewer (×2), Security, Tester

**Timeline**: 1-2 días

```yaml
  7a_frontend_review:
    dependencies: [4_frontend_dev]
    agent: code-reviewer-frontend
    input: frontend_pr
    actions: [comment, request_changes, approve]
    must_pass: 1  # At least 1 reviewer
    can_block_merge: true

  7b_backend_review:
    dependencies: [4_backend_dev]
    agent: code-reviewer-backend
    input: backend_pr
    actions: [comment, request_changes, approve]
    must_pass: 1
    security_required: true  # Security must also approve

  7c_security_review:
    dependencies: [4_backend_dev, 5_security_audit]
    agent: security
    input: backend_pr, security_audit
    actions: [scan, approve_or_block]
    critical_vulns_block_merge: true
    high_vulns_require_mitigation: true

  7d_test_coverage:
    dependencies: [4_frontend_dev, 4_backend_dev]
    agent: tester
    input: frontend_pr, backend_pr
    actions: [run_tests, check_coverage, benchmark]
    must_pass: tests_passing && coverage > 85%

Status: in_progress
Parallel_reviewers: 4
Approved: frontend_review
Pending: backend_review (awaiting security_review)
Blockers: security_review
```

**Output**:
- Approved PRs (if all pass)
- Comments & requested changes
- Test coverage report
- Security clearance

---

### Fase 4: Merge & Deploy (Serial - Ordered)

**Agentes**: CodeReviewer, DevOps, Monitor

**Timeline**: 1-2 horas

```yaml
  8_merge_to_dev:
    dependencies: [7a_frontend_review, 7b_backend_review, 7c_security_review, 7d_test_coverage]
    agent: code-reviewer
    action: merge_to_dev
    requires: all_approved

  9_deploy_staging:
    dependencies: [8_merge_to_dev]
    agent: devops
    environment: staging
    actions: [trigger_ci, deploy_manifests, smoke_test]
    automatic_after_merge: true
    timeout: 30min

  10_smoke_test:
    dependencies: [9_deploy_staging]
    agent: tester
    test_type: smoke
    environments: [staging]
    must_pass: all

  11_monitor_staging:
    dependencies: [9_deploy_staging]
    agent: monitor
    duration: 1hour
    metrics: [error_rate, latency, cpu, memory]
    alert_if: error_rate > 1% or p99_latency > 500ms

Status: in_progress
Completed: 8_merge_to_dev
In_progress: 9_deploy_staging (20min elapsed)
Pending: 10_smoke_test, 11_monitor_staging
```

**Output**:
- Code merged to dev
- Deployed to staging
- Smoke tests pass
- Monitoring active

---

### Fase 5: Final Validation & Release

**Agentes**: DecisionMaker, DevOps, Marketer, Monitor

**Timeline**: 1-3 horas

```yaml
  12_final_approval:
    dependencies: [10_smoke_test, 11_monitor_staging]
    agent: decision-maker
    input: test_results, monitoring_report, security_clearance
    action: approve_for_production
    if_blocked: defer_to_next_week

  13_deploy_production:
    dependencies: [12_final_approval]
    agent: devops
    environment: production
    deployment_strategy: blue_green  # 0 downtime
    actions: [deploy, health_check, traffic_switch]
    rollback_on: any_error

  14_monitor_production:
    dependencies: [13_deploy_production]
    agent: monitor
    duration: 24hours
    alert_thresholds: [error_rate > 0.5%, p99 > 300ms, cpu > 80%]
    auto_rollback_if: critical_error

  15_announce_release:
    dependencies: [13_deploy_production]  # Can start once deployed
    agent: marketer
    async: true
    actions: [draft_blog_post, announce_on_twitter, create_demo_video]

  16_update_docs:
    dependencies: [13_deploy_production]
    agent: documenter
    async: true
    actions: [update_changelog, publish_guide, update_roadmap]

Status: completed
Deployed: 2025-11-10T14:00:00Z
Monitoring: Active
Release_notes: docs/releases/v1.2.0.md
```

**Output**:
- Deployed to production
- 24h monitoring active
- Blog post + social media
- Docs updated
- Release notes published

---

## 🔄 Workflow State Machine

```
Created
  ↓
Planning (serial, approval-gated)
  ├─ Architect designs
  ├─ PM validates
  └─ DecisionMaker approves → GO / NO-GO
  ↓
Implementation (parallel)
  ├─ Frontend dev
  ├─ Backend dev
  ├─ Security audit
  ├─ Tester setup
  └─ Documenter start
  ↓
Review (parallel but gated)
  ├─ Code review
  ├─ Security review
  ├─ Test execution
  └─ Coverage check
  ↓
Merge & Deploy (serial, ordered)
  ├─ Merge to dev
  ├─ Deploy staging
  ├─ Smoke test
  └─ Monitor staging
  ↓
Release (parallel async)
  ├─ Final approval
  ├─ Deploy production
  ├─ Monitor 24h
  ├─ Marketing announce
  └─ Docs update
  ↓
Completed / Rolled back

Transitions:
- Blocked → can escalate to DecisionMaker
- Failed → auto-rollback if production
- Waiting → timeout after N hours
```

---

## 🎯 Workflow DSL (YAML/TOML)

### Minimal Example

```yaml
workflow:
  id: feature-auth
  title: Implement MFA
  agents:
    architect:
      role: Architect
      parallel_with: [pm]
    pm:
      role: ProjectManager
      depends_on: [architect]
    developer:
      role: Developer
      depends_on: [pm]
      parallelizable: true

  approval_required_at: [architecture, deploy_production]
  allow_concurrent_agents: 10
  timeline_hours: 48
```

### Complex Example (Feature-complete)

```yaml
workflow:
  id: feature-user-preferences
  title: User Preferences System
  created_at: 2025-11-09T10:00:00Z

  phases:
    phase_1_design:
      duration_hours: 2
      serial: true
      steps:
        - name: architect_designs
          agent: architect
          input: feature_spec
          output: design_doc

        - name: architect_creates_adr
          agent: architect
          depends_on: architect_designs
          output: adr-017.md

        - name: pm_reviews
          agent: project-manager
          depends_on: architect_creates_adr
          approval_required: true

    phase_2_implementation:
      duration_hours: 48
      parallel: true
      max_concurrent_agents: 6

      steps:
        - name: frontend_dev
          agent: developer
          skill_match: frontend
          depends_on: [architect_designs]

        - name: backend_dev
          agent: developer
          skill_match: backend
          depends_on: [architect_designs]

        - name: db_migration
          agent: devops
          depends_on: [architect_designs]

        - name: security_review
          agent: security
          depends_on: [architect_designs]

        - name: docs_start
          agent: documenter
          depends_on: [architect_creates_adr]
          priority: low

    phase_3_review:
      duration_hours: 16
      gate: all_tests_pass && all_reviews_approved

      steps:
        - name: frontend_review
          agent: code-reviewer
          depends_on: frontend_dev

        - name: backend_review
          agent: code-reviewer
          depends_on: backend_dev

        - name: tests
          agent: tester
          depends_on: [frontend_dev, backend_dev]

        - name: deploy_staging
          agent: devops
          depends_on: [frontend_review, backend_review, tests]

    phase_4_release:
      duration_hours: 4

      steps:
        - name: final_approval
          agent: decision-maker
          depends_on: phase_3_review

        - name: deploy_production
          agent: devops
          depends_on: final_approval
          strategy: blue_green

        - name: announce
          agent: marketer
          depends_on: deploy_production
          async: true
```

---

## 🔧 Runtime: Monitoring & Adjustment

### Dashboard (Real-Time)

```
Workflow: feature-auth-mfa
Status: in_progress (Phase 2/5)
Progress: 45%
Timeline: 2/4 days remaining

Active Agents (5/12):
├─ architect-001         🟢 Designing (80% done)
├─ developer-frontend-001 🟢 Implementing (60% done)
├─ developer-backend-001  🟢 Implementing (50% done)
├─ security-001          🟢 Auditing (70% done)
└─ documenter-001        🟡 Waiting for PR links

Pending Agents (4):
├─ code-reviewer-001     ⏳ Waiting for frontend_dev
├─ code-reviewer-002     ⏳ Waiting for backend_dev
├─ tester-001            ⏳ Waiting for dev completion
└─ devops-001            ⏳ Waiting for reviews

Blockers: none
Issues: none
Risks: none

Timeline Projection:
- Design:      ✅ 2h (completed)
- Implementation: 3d (50% done, on track)
- Review:     1d (scheduled)
- Deploy:     4h (scheduled)
Total ETA:    4d (vs 5d planned, 1d early!)
```

### Workflow Adjustments

```rust
pub enum WorkflowAdjustment {
    // Add more agents if progress slow
    AddAgent { agent_role: AgentRole, count: u32 },

    // Parallelize steps that were serial
    Parallelize { step_ids: Vec<String> },

    // Skip optional steps to save time
    SkipOptionalSteps { step_ids: Vec<String> },

    // Escalate blocker to DecisionMaker
    EscalateBlocker { step_id: String },

    // Pause workflow for manual review
    Pause { reason: String },

    // Cancel workflow if infeasible
    Cancel { reason: String },
}

// Example: If timeline too tight, add agents
if projected_timeline > planned_timeline {
    workflow.adjust(WorkflowAdjustment::AddAgent {
        agent_role: AgentRole::Developer,
        count: 2,
    }).await?;
}
```

---

## 🎯 Implementation Checklist

- [ ] Workflow YAML/TOML parser
- [ ] State machine executor (Created→Completed)
- [ ] Parallel task scheduler
- [ ] Dependency resolution (topological sort)
- [ ] Gate evaluation (all_passed, any_approved, etc.)
- [ ] Blocking & escalation logic
- [ ] Rollback on failure
- [ ] Real-time dashboard
- [ ] Audit trail (who did what, when, why)
- [ ] CLI: `vapora workflow run feature-auth.yaml`
- [ ] CLI: `vapora workflow status --id feature-auth`
- [ ] Monitoring & alerting

---

## 📊 Success Metrics

✅ 10+ agents coordinated without errors
✅ Parallel execution actual (not serial)
✅ Dependencies respected
✅ Approval gates enforce correctly
✅ Rollback works on failure
✅ Dashboard updates real-time
✅ Workflow completes in <5% over estimated time

---

**Version**: 0.1.0
**Status**: ✅ Specification Complete (VAPORA v1.0)
**Purpose**: Multi-agent parallel workflow orchestration