Jesús Pérez d14150da75 feat: Phase 5.3 - Multi-Agent Learning Infrastructure

Implement intelligent agent learning from Knowledge Graph execution history
with per-task-type expertise tracking, recency bias, and learning curves.

## Phase 5.3 Implementation

### Learning Infrastructure (✅ Complete)
- LearningProfileService with per-task-type expertise metrics
- TaskTypeExpertise model tracking success_rate, confidence, learning curves
- Recency bias weighting: recent 7 days weighted 3x higher (exponential decay)
- Confidence scoring prevents overfitting: min(1.0, executions / 20)
- Learning curves computed from daily execution windows

### Agent Scoring Service (✅ Complete)
- Unified AgentScore combining SwarmCoordinator + learning profiles
- Scoring formula: 0.3*base + 0.5*expertise + 0.2*confidence
- Rank agents by combined score for intelligent assignment
- Support for recency-biased scoring (recent_success_rate)
- Methods: rank_agents, select_best, rank_agents_with_recency

### KG Integration (✅ Complete)
- KGPersistence::get_executions_for_task_type() - query by agent + task type
- KGPersistence::get_agent_executions() - all executions for agent
- Coordinator::load_learning_profile_from_kg() - core KG→Learning integration
- Coordinator::load_all_learning_profiles() - batch load for multiple agents
- Convert PersistedExecution → ExecutionData for learning calculations

### Agent Assignment Integration (✅ Complete)
- AgentCoordinator uses learning profiles for task assignment
- extract_task_type() infers task type from title/description
- assign_task() scores candidates using AgentScoringService
- Fallback to load-based selection if no learning data available
- Learning profiles stored in coordinator.learning_profiles RwLock

### Profile Adapter Enhancements (✅ Complete)
- create_learning_profile() - initialize empty profiles
- add_task_type_expertise() - set task-type expertise
- update_profile_with_learning() - update swarm profiles from learning

## Files Modified

### vapora-knowledge-graph/src/persistence.rs (+30 lines)
- get_executions_for_task_type(agent_id, task_type, limit)
- get_agent_executions(agent_id, limit)

### vapora-agents/src/coordinator.rs (+100 lines)
- load_learning_profile_from_kg() - core KG integration method
- load_all_learning_profiles() - batch loading for agents
- assign_task() already uses learning-based scoring via AgentScoringService

### Existing Complete Implementation
- vapora-knowledge-graph/src/learning.rs - calculation functions
- vapora-agents/src/learning_profile.rs - data structures and expertise
- vapora-agents/src/scoring.rs - unified scoring service
- vapora-agents/src/profile_adapter.rs - adapter methods

## Tests Passing
- learning_profile: 7 tests ✅
- scoring: 5 tests ✅
- profile_adapter: 6 tests ✅
- coordinator: learning-specific tests ✅

## Data Flow
1. Task arrives → AgentCoordinator::assign_task()
2. Extract task_type from description
3. Query KG for task-type executions (load_learning_profile_from_kg)
4. Calculate expertise with recency bias
5. Score candidates (SwarmCoordinator + learning)
6. Assign to top-scored agent
7. Execution result → KG → Update learning profiles

## Key Design Decisions
✅ Recency bias: 7-day half-life with 3x weight for recent performance
✅ Confidence scoring: min(1.0, total_executions / 20) prevents overfitting
✅ Hierarchical scoring: 30% base load, 50% expertise, 20% confidence
✅ KG query limit: 100 recent executions per task-type for performance
✅ Async loading: load_learning_profile_from_kg supports concurrent loads

## Next: Phase 5.4 - Cost Optimization
Ready to implement budget enforcement and cost-aware provider selection.

2026-01-11 13:03:53 +00:00

14 KiB

Raw Blame History

Doc-Lifecycle-Manager Integration Guide

Overview

doc-lifecycle-manager (external project) provides complete documentation lifecycle management for VAPORA, including classification, consolidation, semantic search, real-time updates, and enterprise security features.

Project Location: External project (doc-lifecycle-manager) Status: ✅ Enterprise-Ready Tests: 155/155 passing | Zero unsafe code

What is doc-lifecycle-manager?

A comprehensive Rust-based system that handles documentation throughout its entire lifecycle:

Core Capabilities (Phases 1-3)

Automatic Classification: Categorizes docs (vision, design, specs, ADRs, guides, testing, archive)
Duplicate Detection: Finds similar documents with TF-IDF analysis
Semantic RAG Indexing: Vector embeddings for semantic search
mdBook Generation: Auto-generates documentation websites

Enterprise Features (Phases 4-7)

GraphQL API: Semantic document queries with pagination
Real-Time Events: WebSocket streaming of doc updates
Distributed Tracing: OpenTelemetry with W3C Trace Context
Security: mTLS with automatic certificate rotation
Performance: Comprehensive benchmarking with percentiles
Persistence: SurrealDB backend (feature-gated)

Integration Architecture

Data Flow in VAPORA

Frontend/Agents
    ↓
┌─────────────────────────────────┐
│   VAPORA API Layer (Axum)       │
│   ├─ REST endpoints             │
│   └─ WebSocket gateway          │
└─────────────────────────────────┘
    ↓
┌─────────────────────────────────┐
│  doc-lifecycle-manager Services │
│                                 │
│  ├─ GraphQL Resolver            │
│  ├─ WebSocket Manager           │
│  ├─ Document Classifier         │
│  ├─ RAG Indexer                 │
│  └─ mTLS Auth Manager           │
└─────────────────────────────────┘
    ↓
┌─────────────────────────────────┐
│   Data Layer                    │
│   ├─ SurrealDB (vectors)        │
│   ├─ NATS JetStream (events)    │
│   └─ Redis (cache)              │
└─────────────────────────────────┘

Component Integration Points

1. Documenter Agent ↔ doc-lifecycle-manager

use vapora_doc_lifecycle::prelude::*;

// On task completion
async fn on_task_completed(task_id: &str) {
    let config = PluginConfig::default();
    let mut docs = DocumenterIntegration::new(config)?;
    docs.on_task_completed(task_id).await?;
}

2. Frontend ↔ GraphQL API

{
  documentSearch(query: {
    text_query: "authentication"
    limit: 10
  }) {
    results { id title relevance_score }
  }
}

3. Frontend ↔ WebSocket Events

const ws = new WebSocket("ws://vapora/doc-events");
ws.onmessage = (event) => {
  const { event_type, payload } = JSON.parse(event.data);
  // Update UI on document_indexed, document_updated, etc.
};

4. Agent-to-Agent ↔ NATS JetStream

Task Completed Event
  → Documenter Agent (NATS)
    → Classify + Index
      → Broadcast DocumentIndexed Event
        → All Agents notified

Feature Set by Phase

Phase 1: Foundation & Core Library ✅

Error handling and configuration
Core abstractions and types

Phase 2: Extended Implementation ✅

Document Classifier (7 types)
Consolidator (TF-IDF)
RAG Indexer (markdown-aware)
MDBook Generator

Phase 3: CLI & Automation ✅

4 command handlers
62+ Just recipes
5 NuShell scripts

Phase 4: VAPORA Deep Integration ✅

NATS JetStream events
Vector store trait
Plugin system
Agent coordination

Phase 5: Production Hardening ✅

Real NATS integration
DocServer RBAC (4 roles, 3 visibility levels)
Root Files Keeper (auto-update README, CHANGELOG)
Kubernetes manifests (7 YAML files)

Phase 6: Multi-Agent VAPORA ✅

Agent registry with health checking
CI/CD pipeline (GitHub Actions)
Prometheus monitoring rules
Comprehensive documentation

Phase 7: Advanced Features ✅

SurrealDB Backend: Persistent vector store
OpenTelemetry: W3C Trace Context support
GraphQL API: Query builder with semantic search
WebSocket Events: Real-time subscriptions
mTLS Auth: Certificate rotation
Benchmarking: P95/P99 metrics

How to Use in VAPORA

1. Basic Integration (Documenter Agent)

// In vapora-backend/documenter_agent.rs

use vapora_doc_lifecycle::prelude::*;

impl DocumenterAgent {
    async fn process_task(&self, task: Task) -> Result<()> {
        let config = PluginConfig::default();
        let mut integration = DocumenterIntegration::new(config)?;

        // Automatically classifies, indexes, and generates docs
        integration.on_task_completed(&task.id).await?;

        Ok(())
    }
}

2. GraphQL Queries (Frontend/Agents)

# Search for documentation
query SearchDocs($query: String!) {
  documentSearch(query: {
    text_query: $query
    limit: 10
    visibility: "Public"
  }) {
    results {
      id
      title
      path
      relevance_score
      preview
    }
    total_count
    has_more
  }
}

# Get specific document
query GetDoc($id: ID!) {
  document(id: $id) {
    id
    title
    content
    metadata {
      created_at
      updated_at
      owner_id
    }
  }
}

3. Real-Time Updates (Frontend)

// Connect to doc-lifecycle WebSocket
const docWs = new WebSocket('ws://vapora-api/doc-lifecycle/events');

// Subscribe to document changes
docWs.onopen = () => {
  docWs.send(JSON.stringify({
    type: 'subscribe',
    event_types: ['document_indexed', 'document_updated', 'search_index_rebuilt'],
    min_priority: 5
  }));
};

// Handle updates
docWs.onmessage = (event) => {
  const message = JSON.parse(event.data);

  if (message.event_type === 'document_indexed') {
    console.log('New doc indexed:', message.payload);
    // Refresh documentation view
  }
};

4. Distributed Tracing

All operations are automatically traced:

GET /api/documents?search=auth
  trace_id: 0af7651916cd43dd8448eb211c80319c
  span_id: b7ad6b7169203331

  ├─ graphql_resolver [15ms]
  │  ├─ rbac_check [2ms]
  │  └─ semantic_search [12ms]
  └─ response [1ms]

5. mTLS Security

Service-to-service communication is secured:

# Kubernetes secret for certs
apiVersion: v1
kind: Secret
metadata:
  name: doc-lifecycle-certs
data:
  server.crt: <base64>
  server.key: <base64>
  ca.crt: <base64>

Deployment in VAPORA

Kubernetes Manifests Provided

kubernetes/
├── namespace.yaml                    # Create doc-lifecycle namespace
├── configmap.yaml                    # Configuration
├── deployment.yaml                   # Main service (2 replicas)
├── statefulset-nats.yaml            # NATS JetStream (3 replicas)
├── statefulset-surreal.yaml         # SurrealDB (1 replica)
├── service.yaml                      # Internal services
├── rbac.yaml                         # RBAC configuration
└── prometheus-rules.yaml             # Monitoring rules

Quick Deploy

# Deploy to VAPORA cluster
kubectl apply -f /Tools/doc-lifecycle-manager/kubernetes/

# Verify
kubectl get pods -n doc-lifecycle
kubectl get svc -n doc-lifecycle

Configuration via ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: doc-lifecycle-config
  namespace: doc-lifecycle
data:
  config.json: |
    {
      "mode": "full",
      "classification": {
        "auto_classify": true,
        "confidence_threshold": 0.8
      },
      "rag": {
        "enable_embeddings": true,
        "max_chunk_size": 512
      },
      "nats": {
        "server": "nats://nats:4222",
        "jetstream_enabled": true
      },
      "otel": {
        "enabled": true,
        "jaeger_endpoint": "http://jaeger:14268"
      },
      "mtls": {
        "enabled": true,
        "rotation_days": 30
      }
    }

VAPORA Agent Integration

Documenter Agent

// Processes documentation tasks
pub struct DocumenterAgent {
    integration: DocumenterIntegration,
    nats: NatsEventHandler,
}

impl DocumenterAgent {
    pub async fn handle_task(&self, task: Task) -> Result<()> {
        // 1. Classify document
        self.integration.on_task_completed(&task.id).await?;

        // 2. Broadcast via NATS
        let event = DocsUpdatedEvent {
            task_id: task.id,
            doc_count: 5,
        };
        self.nats.publish_docs_updated(event).await?;

        Ok(())
    }
}

Developer Agent (Uses Search)

// Searches for relevant documentation
pub struct DeveloperAgent;

impl DeveloperAgent {
    pub async fn find_relevant_docs(&self, task: Task) -> Result<Vec<DocumentResult>> {
        // GraphQL query for semantic search
        let query = DocumentQuery {
            text_query: Some(task.description),
            limit: Some(5),
            visibility: Some("Internal".to_string()),
            ..Default::default()
        };

        // Execute search
        resolver.resolve_document_search(query, user).await
    }
}

CodeReviewer Agent (Uses Context)

// Uses documentation as context for reviews
pub struct CodeReviewerAgent;

impl CodeReviewerAgent {
    pub async fn review_with_context(&self, code: &str) -> Result<Review> {
        // Search for related documentation
        let docs = semantic_search(code_summary).await?;

        // Use docs as context in review
        let review = llm_client
            .review_code(code, &docs.to_context_string())
            .await?;

        Ok(review)
    }
}

Performance & Scaling

Expected Performance

Operation	Latency	Throughput
Classify doc	<10ms	1000 docs/sec
GraphQL query	<200ms	50 queries/sec
WebSocket broadcast	<20ms	1000 events/sec
Semantic search	<100ms	50 searches/sec
mTLS validation	<5ms	N/A

Resource Requirements

Deployment Resources:

CPU: 2-4 cores (main service)
Memory: 512MB-2GB
Storage: 50GB (SurrealDB + vectors)

NATS Requirements:

CPU: 1-2 cores
Memory: 256MB-1GB
Persistent volume: 20GB

Monitoring & Observability

Prometheus Metrics

# Error rate
rate(doc_lifecycle_errors_total[5m])

# Latency
histogram_quantile(0.99, doc_lifecycle_request_duration_seconds)

# Service availability
up{job="doc-lifecycle"}

Distributed Tracing

Traces are sent to Jaeger in W3C format:

Trace: 0af7651916cd43dd8448eb211c80319c
├─ Span: graphql_resolver
│  ├─ Span: rbac_check
│  └─ Span: semantic_search
└─ Span: response

Health Checks

# Liveness probe
curl http://doc-lifecycle:8080/health/live

# Readiness probe
curl http://doc-lifecycle:8080/health/ready

Configuration Reference

Environment Variables

# Core
DOC_LIFECYCLE_MODE=full                          # minimal|standard|full
DOC_LIFECYCLE_ENABLED=true

# Classification
CLASSIFIER_AUTO_CLASSIFY=true
CLASSIFIER_CONFIDENCE_THRESHOLD=0.8

# RAG/Search
RAG_ENABLE_EMBEDDINGS=true
RAG_MAX_CHUNK_SIZE=512
RAG_CHUNK_OVERLAP=50

# NATS
NATS_SERVER_URL=nats://nats:4222
NATS_JETSTREAM_ENABLED=true

# SurrealDB (optional)
SURREAL_DB_URL=ws://surrealdb:8000
SURREAL_NAMESPACE=vapora
SURREAL_DATABASE=documents

# OpenTelemetry
OTEL_ENABLED=true
OTEL_JAEGER_ENDPOINT=http://jaeger:14268
OTEL_SERVICE_NAME=vapora-doc-lifecycle

# mTLS
MTLS_ENABLED=true
MTLS_SERVER_CERT=/etc/vapora/certs/server.crt
MTLS_SERVER_KEY=/etc/vapora/certs/server.key
MTLS_CA_CERT=/etc/vapora/certs/ca.crt
MTLS_ROTATION_DAYS=30

Integration Checklist

Immediate (Ready Now)

Core features (Phases 1-3)
VAPORA integration (Phase 4)
Production hardening (Phase 5)
Multi-agent support (Phase 6)
Enterprise features (Phase 7)
Kubernetes deployment
GraphQL API
WebSocket events
Distributed tracing
mTLS security

Planned (Phase 8)

Jaeger exporter
SurrealDB live testing
Load testing
Performance tuning
Production deployment guide

Troubleshooting

Common Issues

1. NATS Connection Failed

# Check NATS service
kubectl get svc -n doc-lifecycle
kubectl logs -n doc-lifecycle deployment/nats

2. GraphQL Query Timeout

# Check semantic search performance
# Query execution should be < 200ms
# Check RAG index size

3. WebSocket Disconnection

# Verify WebSocket port is open
# Check subscription history size
# Monitor event broadcast latency

References

Documentation Files:

/Tools/doc-lifecycle-manager/PHASE_7_COMPLETION.md - Phase 7 details
/Tools/doc-lifecycle-manager/PHASES_COMPLETION.md - All phases overview
/Tools/doc-lifecycle-manager/INTEGRATION_WITH_VAPORA.md - Integration guide
/Tools/doc-lifecycle-manager/kubernetes/README.md - K8s deployment

Source Code:

crates/vapora-doc-lifecycle/src/lib.rs - Main library
crates/vapora-doc-lifecycle/src/graphql_api.rs - GraphQL resolver
crates/vapora-doc-lifecycle/src/websocket_events.rs - WebSocket manager
crates/vapora-doc-lifecycle/src/mtls_auth.rs - Security

Support

For questions or issues:

Check documentation in /Tools/doc-lifecycle-manager/
Review test cases for usage examples
Check Kubernetes logs: kubectl logs -n doc-lifecycle <pod>
Monitor with Prometheus/Grafana

Status: ✅ Ready for Production Deployment Last Updated: 2025-11-10 Maintainer: VAPORA Team

14 KiB Raw Blame History