# 🤖 Agent Registry & Coordination ## Multi-Agent Orchestration System **Version**: 0.1.0 **Status**: Specification (VAPORA v1.0 - Multi-Agent) **Purpose**: Sistema de registro, descubrimiento y coordinación de agentes --- ## 🎯 Objetivo Crear un **marketplace de agentes** donde: - ✅ 12 roles especializados trabajan en paralelo - ✅ Cada agente tiene capacidades, dependencias, versiones claras - ✅ Discovery & instalación automática - ✅ Health monitoring + auto-restart - ✅ Inter-agent communication via NATS JetStream - ✅ Shared context via MCP/RAG --- ## 📋 Los 12 Roles de Agentes ### Tier 1: Technical Core (Código) **Architect** (Role ID: `architect`) - Responsabilidad: Diseño de sistemas, decisiones arquitectónicas - Entrada: Task de feature compleja, contexto de proyecto - Salida: ADRs, design documents, architecture diagrams - LLM óptimo: Claude Opus (complejidad alta) - Trabajo: Individual o iniciador de workflows - Canales: Publica decisiones, consulta Decision-Maker **Developer** (Role ID: `developer`) - Responsabilidad: Implementación de código - Entrada: Especificación, ADR, task asignada - Salida: Código, artifacts, PR - LLM óptimo: Claude Sonnet (velocidad + calidad) - Trabajo: Paralelo (múltiples developers por tarea) - Canales: Escucha de Architect, reporta a Reviewer **Reviewer** (Role ID: `code-reviewer`) - Responsabilidad: Revisión de calidad, standards - Entrada: Pull requests, código propuesto - Salida: Comments, aprobación/rechazo, sugerencias - LLM óptimo: Claude Sonnet o Gemini (análisis rápido) - Trabajo: Paralelo (múltiples reviewers) - Canales: Escucha PRs de Developer, reporta a Decision-Maker si crítico **Tester** (Role ID: `tester`) - Responsabilidad: Testing, benchmarks, QA - Entrada: Código implementado - Salida: Test code, benchmark reports, coverage metrics - LLM óptimo: Claude Sonnet (genera tests) - Trabajo: Paralelo - Canales: Escucha de Reviewer, reporta a DevOps ### Tier 2: Documentation & Communication **Documenter** (Role ID: `documenter`) - Responsabilidad: Documentación técnica, root files, ADRs - Entrada: Código, decisions, análisis - Salida: Docs en `docs/`, actualizaciones README/CHANGELOG - Usa: Root Files Keeper + doc-lifecycle-manager - LLM óptimo: GPT-4 (mejor formato) - Trabajo: Async, actualiza continuamente - Canales: Escucha cambios en repo, publica docs **Marketer** (Role ID: `marketer`) - Responsabilidad: Marketing content, messaging - Entrada: Nuevas features, releases - Salida: Blog posts, social content, press releases - LLM óptimo: Claude Sonnet (creatividad) - Trabajo: Async - Canales: Escucha releases, publica content **Presenter** (Role ID: `presenter`) - Responsabilidad: Presentaciones, slides, demos - Entrada: Features, arquitectura, roadmaps - Salida: Slidev presentations, demo scripts - LLM óptimo: Claude Sonnet (format + creativity) - Trabajo: On-demand, por eventos - Canales: Consulta Architect/Developer ### Tier 3: Operations & Infrastructure **DevOps** (Role ID: `devops`) - Responsabilidad: CI/CD, deploys, infrastructure - Entrada: Code approved, deployment requests - Salida: Manifests K8s, deployment logs, rollback - LLM óptimo: Claude Sonnet (IaC) - Trabajo: Paralelo deploys - Canales: Escucha de Reviewer (approved), publica deploy logs **Monitor** (Role ID: `monitor`) - Responsabilidad: Health checks, alerting, observability - Entrada: Deployment events, metrics - Salida: Alerts, dashboards, incident reports - LLM óptimo: Gemini Flash (análisis rápido) - Trabajo: Real-time, continuous - Canales: Publica alerts, escucha todo **Security** (Role ID: `security`) - Responsabilidad: Security analysis, compliance, audits - Entrada: Code changes, PRs, config - Salida: Security reports, CVE checks, audit logs - LLM óptimo: Claude Opus (análisis profundo) - Trabajo: Async, on PRs críticos - Canales: Escucha de Reviewer, puede bloquear PRs ### Tier 4: Management & Coordination **ProjectManager** (Role ID: `project-manager`) - Responsabilidad: Roadmaps, task tracking, coordination - Entrada: Completed tasks, metrics, blockers - Salida: Roadmap updates, task assignments, status reports - LLM óptimo: Claude Sonnet (análisis datos) - Trabajo: Async, agregador - Canales: Publica status, escucha completions **DecisionMaker** (Role ID: `decision-maker`) - Responsabilidad: Decisiones en conflictos, aprobaciones críticas - Entrada: Reportes de agentes, decisiones pendientes - Salida: Aprobaciones, resolución de conflictos - LLM óptimo: Claude Opus (análisis nuanced) - Trabajo: On-demand, decisiones críticas - Canales: Escucha escalaciones, publica decisiones **Orchestrator** (Role ID: `orchestrator`) - Responsabilidad: Coordinación de agentes, assignment de tareas - Entrada: Tasks a hacer, equipo disponible, constraints - Salida: Task assignments, workflow coordination - LLM óptimo: Claude Opus (planejamiento) - Trabajo: Continuous, meta-agent - Canales: Coordina todo, publica assignments --- ## 🏗️ Agent Registry Structure ### Agent Metadata (SurrealDB) ```rust pub struct AgentMetadata { pub id: String, // "architect", "developer-001" pub role: AgentRole, // Architect, Developer, etc pub name: String, // "Senior Architect Agent" pub version: String, // "0.1.0" pub status: AgentStatus, // Active, Inactive, Updating, Error pub capabilities: Vec, // [Design, ADR, Decisions] pub skills: Vec, // ["rust", "kubernetes", "distributed-systems"] pub llm_provider: LLMProvider, // Claude, OpenAI, Gemini, Ollama pub llm_model: String, // "opus-4" pub dependencies: Vec, // Agents this one depends on pub dependents: Vec, // Agents that depend on this one pub health_check: HealthCheckConfig, pub max_concurrent_tasks: u32, pub current_tasks: u32, pub queue_depth: u32, pub created_at: DateTime, pub last_health_check: DateTime, pub uptime_percentage: f64, } pub enum AgentRole { Architect, Developer, CodeReviewer, Tester, Documenter, Marketer, Presenter, DevOps, Monitor, Security, ProjectManager, DecisionMaker, Orchestrator, } pub enum AgentStatus { Active, Inactive, Updating, Error(String), Scaling, } pub struct Capability { pub id: String, // "design-adr" pub name: String, // "Architecture Decision Records" pub description: String, pub complexity: Complexity, // Low, Medium, High, Critical } pub struct HealthCheckConfig { pub interval_secs: u32, pub timeout_secs: u32, pub consecutive_failures_threshold: u32, pub auto_restart_enabled: bool, } ``` ### Agent Instance (Runtime) ```rust pub struct AgentInstance { pub metadata: AgentMetadata, pub pod_id: String, // K8s pod ID pub ip: String, pub port: u16, pub start_time: DateTime, pub last_heartbeat: DateTime, pub tasks_completed: u32, pub avg_task_duration_ms: u32, pub error_count: u32, pub tokens_used: u64, pub cost_incurred: f64, } ``` --- ## 📡 Inter-Agent Communication (NATS) ### Message Protocol ```rust pub enum AgentMessage { // Task assignment TaskAssigned { task_id: String, agent_id: String, context: TaskContext, deadline: DateTime, }, TaskStarted { task_id: String, agent_id: String, timestamp: DateTime, }, TaskProgress { task_id: String, agent_id: String, progress_percent: u32, current_step: String, }, TaskCompleted { task_id: String, agent_id: String, result: TaskResult, tokens_used: u64, duration_ms: u32, }, TaskFailed { task_id: String, agent_id: String, error: String, retry_count: u32, }, // Communication RequestHelp { from_agent: String, to_roles: Vec, context: String, deadline: DateTime, }, HelpOffered { from_agent: String, to_agent: String, capability: Capability, }, ShareContext { from_agent: String, to_roles: Vec, context_type: String, // "decision", "analysis", "code" data: Value, ttl_minutes: u32, }, // Coordination RequestDecision { from_agent: String, decision_type: String, context: String, options: Vec, }, DecisionMade { decision_id: String, decision: String, reasoning: String, made_by: String, }, // Health Heartbeat { agent_id: String, status: AgentStatus, load: f64, // 0.0-1.0 }, } // NATS Subjects (pub/sub pattern) pub mod subjects { pub const TASK_ASSIGNED: &str = "vapora.tasks.assigned"; // Broadcast pub const TASK_PROGRESS: &str = "vapora.tasks.progress"; // Broadcast pub const TASK_COMPLETED: &str = "vapora.tasks.completed"; // Broadcast pub const AGENT_HELP: &str = "vapora.agent.help"; // Request/Reply pub const AGENT_DECISION: &str = "vapora.agent.decision"; // Request/Reply pub const AGENT_HEARTBEAT: &str = "vapora.agent.heartbeat"; // Broadcast } ``` ### Pub/Sub Patterns ```rust // 1. Broadcast: Task assigned to all interested agents nats.publish("vapora.tasks.assigned", task_message).await?; // 2. Request/Reply: Developer asks Help from Architect let help_request = AgentMessage::RequestHelp { ... }; let response = nats.request("vapora.agent.help", help_request, Duration::from_secs(30)).await?; // 3. Stream: Persist task completion for replay nats.publish_to_stream("vapora_tasks", "vapora.tasks.completed", completion_message).await?; // 4. Subscribe: Monitor listens all heartbeats let mut subscription = nats.subscribe("vapora.agent.heartbeat").await?; ``` --- ## 🏪 Agent Discovery & Installation ### Marketplace API ```rust pub struct AgentRegistry { pub agents: HashMap, pub available_agents: HashMap, // Registry pub running_agents: HashMap, // Runtime } pub struct AgentManifest { pub id: String, pub name: String, pub version: String, pub role: AgentRole, pub docker_image: String, // "vapora/agents:developer-0.1.0" pub resources: ResourceRequirements, pub dependencies: Vec, pub health_check_endpoint: String, pub capabilities: Vec, pub documentation: String, } pub struct AgentDependency { pub agent_id: String, pub role: AgentRole, pub min_version: String, pub optional: bool, } impl AgentRegistry { // Discover available agents pub async fn list_available(&self) -> Vec { self.available_agents.values().cloned().collect() } // Install agent pub async fn install( &mut self, manifest: AgentManifest, count: u32, ) -> anyhow::Result> { // Check dependencies for dep in &manifest.dependencies { if !self.is_available(&dep.agent_id) && !dep.optional { return Err(anyhow::anyhow!("Dependency {} required", dep.agent_id)); } } // Deploy to K8s (via Provisioning) let instances = self.deploy_to_k8s(&manifest, count).await?; // Register for instance in &instances { self.running_agents.insert(instance.metadata.id.clone(), instance.clone()); } Ok(instances) } // Health monitoring pub async fn monitor_health(&mut self) -> anyhow::Result<()> { for (id, instance) in &mut self.running_agents { let health = self.check_agent_health(instance).await?; if !health.healthy { if health.consecutive_failures >= instance.metadata.health_check.consecutive_failures_threshold { if instance.metadata.health_check.auto_restart_enabled { self.restart_agent(id).await?; } } } } Ok(()) } } ``` --- ## 🔄 Shared State & Context ### Context Management ```rust pub struct SharedContext { pub project_id: String, pub active_tasks: HashMap, pub agent_states: HashMap, pub decisions: HashMap, pub shared_knowledge: HashMap, // RAG indexed } pub struct AgentState { pub agent_id: String, pub current_task: Option, pub last_action: DateTime, pub available_until: DateTime, pub context_from_previous_tasks: Vec, } // Access via MCP impl SharedContext { pub async fn get_context(&self, agent_id: &str) -> anyhow::Result { self.agent_states.get(agent_id) .cloned() .ok_or(anyhow::anyhow!("Agent {} not found", agent_id)) } pub async fn share_decision(&mut self, decision: Decision) -> anyhow::Result<()> { self.decisions.insert(decision.id.clone(), decision); // Notify interested agents via NATS Ok(()) } pub async fn share_knowledge(&mut self, key: String, value: Value) -> anyhow::Result<()> { self.shared_knowledge.insert(key, value); // Index in RAG Ok(()) } } ``` --- ## 🎯 Implementation Checklist - [ ] Define AgentMetadata + AgentInstance structs - [ ] NATS JetStream integration - [ ] Agent Registry CRUD operations - [ ] Health monitoring + auto-restart logic - [ ] Agent marketplace UI (Leptos) - [ ] Installation flow (manifest parsing, K8s deployment) - [ ] Pub/Sub message handlers - [ ] Request/Reply pattern implementation - [ ] Shared context via MCP - [ ] CLI: `vapora agent list`, `vapora agent install`, `vapora agent scale` - [ ] Logging + monitoring (Prometheus metrics) - [ ] Tests (mocking, integration) --- ## 📊 Success Metrics ✅ Agents register and appear in registry ✅ Health checks run every N seconds ✅ Unhealthy agents restart automatically ✅ NATS messages route correctly ✅ Shared context accessible to all agents ✅ Agent scaling works (1 → N replicas) ✅ Task assignment < 100ms latency --- **Version**: 0.1.0 **Status**: ✅ Specification Complete (VAPORA v1.0) **Purpose**: Multi-agent registry and coordination system