147 lines
4.2 KiB
Rust
Raw Normal View History

feat: Phase 5.3 - Multi-Agent Learning Infrastructure Implement intelligent agent learning from Knowledge Graph execution history with per-task-type expertise tracking, recency bias, and learning curves. ## Phase 5.3 Implementation ### Learning Infrastructure (✅ Complete) - LearningProfileService with per-task-type expertise metrics - TaskTypeExpertise model tracking success_rate, confidence, learning curves - Recency bias weighting: recent 7 days weighted 3x higher (exponential decay) - Confidence scoring prevents overfitting: min(1.0, executions / 20) - Learning curves computed from daily execution windows ### Agent Scoring Service (✅ Complete) - Unified AgentScore combining SwarmCoordinator + learning profiles - Scoring formula: 0.3*base + 0.5*expertise + 0.2*confidence - Rank agents by combined score for intelligent assignment - Support for recency-biased scoring (recent_success_rate) - Methods: rank_agents, select_best, rank_agents_with_recency ### KG Integration (✅ Complete) - KGPersistence::get_executions_for_task_type() - query by agent + task type - KGPersistence::get_agent_executions() - all executions for agent - Coordinator::load_learning_profile_from_kg() - core KG→Learning integration - Coordinator::load_all_learning_profiles() - batch load for multiple agents - Convert PersistedExecution → ExecutionData for learning calculations ### Agent Assignment Integration (✅ Complete) - AgentCoordinator uses learning profiles for task assignment - extract_task_type() infers task type from title/description - assign_task() scores candidates using AgentScoringService - Fallback to load-based selection if no learning data available - Learning profiles stored in coordinator.learning_profiles RwLock ### Profile Adapter Enhancements (✅ Complete) - create_learning_profile() - initialize empty profiles - add_task_type_expertise() - set task-type expertise - update_profile_with_learning() - update swarm profiles from learning ## Files Modified ### vapora-knowledge-graph/src/persistence.rs (+30 lines) - get_executions_for_task_type(agent_id, task_type, limit) - get_agent_executions(agent_id, limit) ### vapora-agents/src/coordinator.rs (+100 lines) - load_learning_profile_from_kg() - core KG integration method - load_all_learning_profiles() - batch loading for agents - assign_task() already uses learning-based scoring via AgentScoringService ### Existing Complete Implementation - vapora-knowledge-graph/src/learning.rs - calculation functions - vapora-agents/src/learning_profile.rs - data structures and expertise - vapora-agents/src/scoring.rs - unified scoring service - vapora-agents/src/profile_adapter.rs - adapter methods ## Tests Passing - learning_profile: 7 tests ✅ - scoring: 5 tests ✅ - profile_adapter: 6 tests ✅ - coordinator: learning-specific tests ✅ ## Data Flow 1. Task arrives → AgentCoordinator::assign_task() 2. Extract task_type from description 3. Query KG for task-type executions (load_learning_profile_from_kg) 4. Calculate expertise with recency bias 5. Score candidates (SwarmCoordinator + learning) 6. Assign to top-scored agent 7. Execution result → KG → Update learning profiles ## Key Design Decisions ✅ Recency bias: 7-day half-life with 3x weight for recent performance ✅ Confidence scoring: min(1.0, total_executions / 20) prevents overfitting ✅ Hierarchical scoring: 30% base load, 50% expertise, 20% confidence ✅ KG query limit: 100 recent executions per task-type for performance ✅ Async loading: load_learning_profile_from_kg supports concurrent loads ## Next: Phase 5.4 - Cost Optimization Ready to implement budget enforcement and cost-aware provider selection.
2026-01-11 13:03:53 +00:00
use crate::error::{Result, TelemetryError};
use opentelemetry::global;
use opentelemetry_jaeger::new_agent_pipeline;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::util::SubscriberInitExt;
use tracing_subscriber::{EnvFilter, Registry};
/// Configuration for telemetry initialization
#[derive(Debug, Clone)]
pub struct TelemetryConfig {
/// Service name for tracing
pub service_name: String,
/// Jaeger agent host
pub jaeger_host: String,
/// Jaeger agent port (default 6831)
pub jaeger_port: u16,
/// Log level filter
pub log_level: String,
/// Enable console output
pub console_output: bool,
/// Enable JSON output
pub json_output: bool,
}
impl Default for TelemetryConfig {
fn default() -> Self {
Self {
service_name: "vapora".to_string(),
jaeger_host: "localhost".to_string(),
jaeger_port: 6831,
log_level: "info".to_string(),
console_output: true,
json_output: false,
}
}
}
/// Telemetry initializer - sets up OpenTelemetry with Jaeger exporter
pub struct TelemetryInitializer;
impl TelemetryInitializer {
/// Initialize tracing with OpenTelemetry and Jaeger exporter
pub fn init(config: TelemetryConfig) -> Result<()> {
// Create Jaeger exporter
let tracer = new_agent_pipeline()
.with_service_name(&config.service_name)
.with_endpoint(format!("{}:{}", config.jaeger_host, config.jaeger_port))
.install_simple()
.map_err(|e| TelemetryError::JaegerError(e.to_string()))?;
// Create OpenTelemetry layer for tracing
let otel_layer = tracing_opentelemetry::layer().with_tracer(tracer);
// Create environment filter from config
let env_filter = EnvFilter::try_from_default_env()
.or_else(|_| EnvFilter::try_new(&config.log_level))
.map_err(|e| TelemetryError::TracerInitFailed(e.to_string()))?;
// Build subscriber with OpenTelemetry layer
let registry = Registry::default()
.with(env_filter)
.with(otel_layer);
if config.console_output {
if config.json_output {
registry
.with(tracing_subscriber::fmt::layer().json())
.init();
} else {
registry
.with(tracing_subscriber::fmt::layer())
.init();
}
} else {
registry.init();
}
tracing::info!(
service = %config.service_name,
jaeger_endpoint = %format!("{}:{}", config.jaeger_host, config.jaeger_port),
"Telemetry initialized successfully"
);
Ok(())
}
/// Initialize minimal tracing for testing (no Jaeger)
pub fn init_noop() -> Result<()> {
let env_filter = EnvFilter::try_from_default_env()
.or_else(|_| EnvFilter::try_new("info"))
.map_err(|e| TelemetryError::TracerInitFailed(e.to_string()))?;
Registry::default()
.with(env_filter)
.with(tracing_subscriber::fmt::layer())
.init();
Ok(())
}
/// Shutdown global tracer (cleanup)
pub fn shutdown() -> Result<()> {
global::shutdown_tracer_provider();
Ok(())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_config_default() {
let config = TelemetryConfig::default();
assert_eq!(config.service_name, "vapora");
assert_eq!(config.jaeger_host, "localhost");
assert_eq!(config.jaeger_port, 6831);
}
#[test]
fn test_init_noop() {
let result = TelemetryInitializer::init_noop();
assert!(result.is_ok());
}
#[test]
fn test_config_custom() {
let config = TelemetryConfig {
service_name: "test-service".to_string(),
jaeger_host: "jaeger.example.com".to_string(),
jaeger_port: 6832,
log_level: "debug".to_string(),
console_output: true,
json_output: true,
};
assert_eq!(config.service_name, "test-service");
assert_eq!(config.jaeger_host, "jaeger.example.com");
assert_eq!(config.jaeger_port, 6832);
}
}