Vapora/CHANGELOG.md
Jesús Pérez bb55c80d2b
feat(workflow-engine): autonomous scheduling with timezone and distributed lock
Add cron-based autonomous workflow firing with two hardening layers:

  - Timezone-aware scheduling via chrono-tz: ScheduledWorkflow.timezone
    (IANA identifier), compute_next_fire_at/after_tz, validate_timezone;
    DST-safe, UTC fallback when absent; validated at config load and REST API

  - Distributed fire-lock via SurrealDB conditional UPDATE (locked_by/locked_at
    fields, 120 s TTL); WorkflowScheduler gains instance_id (UUID) as lock owner;
    prevents double-fires across multi-instance deployments without extra infra

  - ScheduleStore: try_acquire_fire_lock, release_fire_lock (own-instance guard),
    full CRUD (load_one/all, full_upsert, patch, delete, load_runs)

  - REST: 7 endpoints (GET/PUT/PATCH/DELETE schedules, runs history, manual fire)
    with timezone field in all request/response types

  - Migrations 010 (schedule tables) + 011 (timezone + lock columns)
  - Tests: 48 passing (was 26); ADR-0034; changelog; feature docs updated
2026-02-26 11:34:44 +00:00

43 KiB
Raw Blame History

Changelog

All notable changes to VAPORA will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

Added - Autonomous Scheduling: Timezone Support and Distributed Fire-Lock

vapora-workflow-engine — scheduling hardening

  • Timezone-aware cron evaluation (chrono-tz = "0.10"):
    • ScheduledWorkflow.timezone: Option<String> — IANA identifier stored per-schedule
    • compute_next_fire_at_tz(expr, tz) / compute_next_fire_after_tz(expr, after, tz) — generic over chrono_tz::Tz; UTC fallback when tz = None
    • validate_timezone(tz) — compile-time exhaustive IANA enum, rejects unknown identifiers
    • compute_fire_times_tz in scheduler.rs — catch-up and normal firing both timezone-aware
    • Config-load validation: [workflows.schedule] timezone = "..." validated at startup (fail-fast)
  • Distributed fire-lock (SurrealDB document-level atomic CAS):
    • scheduled_workflows gains locked_by: option<string> and locked_at: option<datetime> (migration 011)
    • ScheduleStore::try_acquire_fire_lock(id, instance_id, now) — conditional UPDATE ... WHERE locked_by IS NONE OR locked_at < $expiry; returns true only if update succeeded (non-empty result = lock acquired)
    • ScheduleStore::release_fire_lock(id, instance_id)WHERE locked_by = $instance_id guard prevents stale release after TTL expiry
    • WorkflowScheduler.instance_id: String — UUID generated at startup, identifies lock owner
    • 120-second TTL: crashed instance's lock auto-expires within two scheduler ticks
    • Lock acquired before fire_with_lock, released in finally-style block after (warn on release failure, TTL fallback)
  • New tests: test_validate_timezone_valid, test_validate_timezone_invalid, test_compute_next_fire_at_tz_utc, test_compute_next_fire_at_tz_named, test_compute_next_fire_at_tz_invalid_tz_fallback, test_compute_fires_with_catchup_named_tz, test_instance_id_is_unique
  • Test count: 48 (was 41)

vapora-backend — schedule REST API surface

  • ScheduleResponse, PutScheduleRequest, PatchScheduleRequest gain timezone: Option<String>
  • validate_tz() helper validates at API boundary → 400 InvalidInput on unknown identifier
  • put_schedule and patch_schedule use compute_next_fire_at_tz / compute_next_fire_after_tz
  • fire_schedule uses compute_next_fire_after_tz with schedule's stored timezone

Migrations

  • migrations/011_schedule_tz_lock.surql: DEFINE FIELD timezone, locked_by, locked_at on scheduled_workflows

Documentation

  • ADR-0034: design rationale for chrono-tz selection and SurrealDB conditional UPDATE lock
  • docs/features/workflow-orchestrator.md: Autonomous Scheduling section with TOML config, REST API table, timezone/distributed lock explanations, Prometheus metrics

Added - Workflow Engine Hardening (Persistence · Saga · Cedar)

vapora-workflow-engine — three new hardening layers

  • persistence.rs: SurrealWorkflowStore — crash-recoverable WorkflowInstance state in SurrealDB
    • save() upserts on every state-mutating operation; serializes via serde_json::Value (surrealdb v3 SurrealValue requirement)
    • load_active() on startup restores all non-terminal instances to the in-memory DashMap
    • delete() removes terminal instances after completion
  • saga.rs: SagaCompensator — reverse-order rollback dispatch via SwarmCoordinator
    • Iterates executed stages in reverse; skips stages without compensation_agents in StageConfig
    • Dispatches { type: "compensation", stage_name, workflow_id, original_context, artifacts_to_undo } payload
    • Best-effort: errors are logged and never propagated
  • auth.rs: CedarAuthorizer — per-stage Cedar policy enforcement
    • load_from_dir(path) reads all *.cedar files and compiles a single PolicySet
    • Called before each SwarmCoordinator::assign_task(); deny returns WorkflowError::Unauthorized
    • Disabled when EngineConfig.cedar_policy_dir is None
  • config.rs: StageConfig gains compensation_agents: Option<Vec<String>>; EngineConfig gains cedar_policy_dir: Option<String>
  • instance.rs: WorkflowInstance::mark_current_task_failed() — isolates the current_stage_mut() borrow to avoid NLL conflicts and clippy excessive_nesting in on_task_failed()
  • migrations/009_workflow_state.surql: SCHEMAFULL workflow_instances table; indexes on template_name and created_at
  • New deps: surrealdb = { workspace = true }, cedar-policy = "4.9"
  • Tests: 31 pass (5 new — auth × 3, saga × 2); 0 clippy warnings

vapora-knowledge-graph — surrealdb v3 compatibility fixes

  • All response.take(0) call sites updated from custom #[derive(Deserialize)] structs to Vec<serde_json::Value> intermediary pattern
    • Affected: find_similar_executions, get_agent_success_rate, get_task_distribution, cleanup_old_executions, get_execution_count, get_executions_for_task_type, get_agent_executions, get_task_type_analytics, get_dashboard_metrics, get_cost_report, get_rlm_executions_by_doc, find_similar_rlm_tasks, get_rlm_execution_count, cleanup_old_rlm_executions
  • Root cause: surrealdb v3 changed take() bound from T: DeserializeOwned to T: SurrealValue; serde_json::Value satisfies this; custom structs do not

Fixed - distro.just build and installation

  • distro::install: now builds all 5 server binaries in one cargo build --release pass
    • Added vapora-a2a and vapora-mcp-server to the explicit build list (were missing; silently copied from stale target/release/ if present, skipped otherwise)
    • Added vapora-a2a to the install copy list (was absent entirely)
    • Missing binary → explicit warning with count; exits non-zero if zero installed
  • distro::install-full: new recipe — runs install as a dependency then trunk build --release
    • Replaces the broken UI=true parameter approach: just 1.x treats KEY=value tokens as positional args to the first parameter when invoked via module syntax (distro::recipe), not as named overrides
    • Validates trunk is in PATH before attempting the build
  • distro::install-targets: added wasm32-unknown-unknown; idempotent — checks rustup target list --installed before calling rustup target add
  • distro::build-all-targets: excludes wasm32-unknown-unknown from the workspace loop; WASM requires per-crate trunk build, not cargo build --workspace --target wasm32

Added - NatsBridge + A2A JetStream Integration

vapora-agents — NatsBridge (real JetStream)

  • nats_bridge.rs: new NatsBridge with real async_nats::jetstream::Context
    • submit_task() → JetStream publish with double-await ack, returns sequence number
    • subscribe_task_results() → durable pull consumer (WorkQueue retention), returns mpsc::Receiver<TaskResult>
    • list_agents() → reads from live AgentRegistry, never hardcoded
    • NatsBrokerConfig with sensible defaults; stream auto-created via get_or_create_stream
  • swarm_adapter.rs: replaced all 3 stubs with real logic
    • select_agent()swarm.submit_task_for_bidding() for load-balanced selection
    • report_completion()swarm.update_agent_status() with load adjustment on failure
    • agent_load() → derives current tasks from fractional load via swarm.get_agent()

vapora-swarmSwarmCoordinator::get_agent()

  • Added pub fn get_agent(&self, agent_id: &str) -> Option<AgentProfile> to expose per-agent profiles from private DashMap

vapora-a2a — NatsBridge integration + SurrealDB serialization fixes

  • CoordinatorBridge: replaced raw NatsClient with Option<Arc<NatsBridge>>
    • start_result_listener() uses JetStream pull consumer (at-least-once delivery)
    • dispatch() publishes to JetStream after coordinator assignment (non-fatal fallback)
    • list_agents() delegates to NatsBridge.list_agents()
  • server.rs: added GET /a2a/agents endpoint
  • task_manager.rs: fixed SurrealDB serialization
    • create(): switched from .content() to parameterized INSERT INTO query; avoids SurrealDB serializer failing on adjacently-tagged enums (A2aMessagePart)
    • get(): changed SELECT * to explicit field projection; excludes id (SurrealDB Thing) and casts datetimes with type::string() to avoid serde_json::Value deserialization failures
  • Integration tests verified: 4/5 pass with SurrealDB + NATS; 5th requires live agent

vapora-leptos-ui

  • Set doctest = false in [lib]: Leptos components require WASM reactive runtime; native doctests are incompatible by design

Added - NATS JetStream local container

  • /containers/nats/: Docker Compose service following existing containers pattern
    • JetStream enabled via nats.conf (store_dir: /data, max_mem: 1G, max_file: 10G)
    • Persistent volume at ./nats_data
    • Ports: 4222 (client), 8222 (HTTP monitoring)
    • local_net network, restart: unless-stopped

Added - Recursive Language Models (RLM) Integration (v1.3.0)

Core RLM Engine (vapora-rlm crate - 17,000+ LOC)

  • Distributed Reasoning System: Process documents >100k tokens without context rot

    • Chunking strategies: Fixed-size, Semantic (sentence-aware), Code-aware (AST-based for Rust/Python/JS)
    • Hybrid search: BM25 (Tantivy in-memory) + Semantic (embeddings) + RRF fusion
    • LLM dispatch: Parallel LLM calls across relevant chunks with aggregation
    • Sandbox execution: WASM tier (<10ms) + Docker tier (80-150ms) with auto-tier selection
  • Storage & Persistence: SurrealDB integration with SCHEMALESS tables

    • rlm_chunks table with chunk_id UNIQUE index
    • rlm_buffers table for pass-by-reference large contexts
    • rlm_executions table for learning from historical executions
    • Migration: migrations/008_rlm_schema.surql
  • Chunking Strategies (reused 90-95% from zircote/rlm-rs)

    • Fixed: Fixed-size chunks with configurable overlap
    • Semantic: Unicode-aware, respects sentence boundaries
    • Code: AST-based for Rust, Python, JavaScript (via tree-sitter)
  • Hybrid Search Engine

    • BM25 full-text search via Tantivy (in-memory index, auto-rebuild)
    • Semantic search via SurrealDB vector similarity (vector::similarity::cosine)
    • Reciprocal Rank Fusion (RRF) combines rankings optimally
    • Configurable weighting: BM25 weight 0.5, semantic weight 0.5
  • Multi-Provider LLM Integration

    • OpenAI (GPT-4, GPT-4-turbo, GPT-3.5-turbo)
    • Anthropic Claude (Opus, Sonnet, Haiku)
    • Ollama (Llama 2, Mistral, CodeLlama, local/free)
    • Cost tracking per provider (tokens + cost per 1M tokens)
  • Embedding Providers

    • OpenAI embeddings (text-embedding-3-small: 1536 dims, text-embedding-3-large: 3072 dims)
    • Ollama embeddings (local, free)
    • Configurable via EmbeddingConfig
  • Sandbox Execution (WASM + Docker hybrid)

    • WASM tier: Direct Wasmtime invocation (<10ms cold start, 25MB memory)
      • WASI-compatible commands: peek, grep, slice
      • Resource limits: 100MB memory, 5s CPU timeout
      • Security: No network, no filesystem write, read-only workspace
    • Docker tier: Pre-warmed container pool (80-150ms from warm pool)
      • Pool size: 10-20 standby containers
      • Full Linux tooling compatibility
      • Auto-replenish on claim, graceful shutdown
    • Auto-dispatcher: Automatically selects tier based on task complexity
  • Prometheus Metrics

    • vapora_rlm_chunks_total{strategy} - Chunks created by strategy
    • vapora_rlm_query_duration_seconds - Query latency (P50/P95/P99)
    • vapora_rlm_dispatch_duration_seconds - LLM dispatch latency
    • vapora_rlm_sandbox_executions_total{tier} - Sandbox tier usage
    • vapora_rlm_cost_cents{provider} - Cost tracking per provider

Performance Benchmarks

  • Query Latency (100 queries):

    • Average: 90.6ms
    • P50: 87.5ms
    • P95: 88.3ms
    • P99: 91.7ms
  • Large Document Processing (10k lines, 2728 chunks):

    • Load time: ~22s (chunking + embedding + indexing + BM25 build)
    • Query time: ~565ms
    • Full workflow: <30s
  • BM25 Index:

    • Build time: ~100ms for 1000 docs
    • Search: <1ms for most queries

Production Configuration

  • Setup Examples:

    • examples/production_setup.rs - OpenAI production setup with GPT-4
    • examples/local_ollama.rs - Local development with Ollama (free, no API keys)
  • Configuration Files:

    • RLMEngineConfig with chunking strategy, embedding provider, auto-rebuild BM25
    • ChunkingConfig with strategy, chunk size, overlap
    • EmbeddingConfig presets: openai_small(), openai_large(), ollama(model)

Integration Points

  • LLM Router Integration: RLM as new LLM provider for long-context tasks
  • Knowledge Graph Integration: Execution history persistence with learning curves
  • Backend API: New endpoint POST /api/v1/rlm/analyze

Test Coverage

  • 38/38 tests passing (100% pass rate):
    • Basic integration: 4/4
    • E2E integration: 9/9
    • Security: 13/13
    • Performance: 8/8
    • Debug tests: 4/4

Documentation

  • Architecture Decision Record: docs/adrs/0029-rlm-recursive-language-models.md

    • Context and problem statement
    • Considered options (RAG, LangChain, custom RLM)
    • Decision rationale and trade-offs
    • Performance validation and benchmarks
  • Usage Guide: docs/guides/rlm-usage-guide.md

    • Chunking strategies selection guide
    • Hybrid search configuration
    • LLM dispatch patterns
    • Use cases: code review, Q&A, log analysis, knowledge base
    • Performance tuning and troubleshooting
  • Production Guide: crates/vapora-rlm/PRODUCTION.md

    • Quick start (cloud with OpenAI, local with Ollama)
    • Configuration examples
    • LLM provider selection
    • Cost optimization strategies

Code Quality

  • Zero clippy warnings (cargo clippy --workspace -- -D warnings)
  • Clean compilation (cargo build --workspace)
  • Comprehensive error handling: thiserror for structured errors, proper Result propagation
  • Contextual logging: All errors logged with task_id, operation, error details
  • No stubs or placeholders: 100% production-ready implementation

Key Architectural Decisions

  • SCHEMALESS vs SCHEMAFULL: SurrealDB tables use SCHEMALESS to avoid conflicts with auto-generated id fields
  • Hybrid Search: BM25 + Semantic + RRF outperforms either alone empirically
  • Custom Implementation: Native Rust RLM vs Python frameworks (LangChain/LlamaIndex) for performance, control, and zero-cost abstractions
  • Reuse from zircote/rlm-rs: 60-70% reuse (chunking, RRF, core types) as dependency, not fork

Added - Leptos Component Library (vapora-leptos-ui)

Component Library Implementation (vapora-leptos-ui crate)

  • 16 production-ready components with CSR/SSR agnostic architecture
  • Primitives (4): Button, Input, Badge, Spinner with variant/size support
  • Layout (2): Card (glassmorphism with blur/glow), Modal (backdrop + keyboard support)
  • Navigation (1): SpaLink (History API integration, external link detection)
  • Forms (1 + 4 utils): FormField with validation (required, email, min/max length)
  • Data (3): Table (sortable columns), Pagination (smart ellipsis), StatCard (metrics with trends)
  • Feedback (3): ToastProvider, ToastContext, use_toast hook (3-second auto-dismiss)
  • Type-safe theme system: Variant, Size, BlurLevel, GlowColor enums
  • Unified/client/ssr pattern: Compile-time branching for CSR/SSR contexts
  • 301 UnoCSS utilities generated from Rust source files
  • Zero clippy warnings (strict mode -D warnings)
  • 4 validation tests (all passing)

UnoCSS Build Pipeline

  • uno.config.ts configuration scanning Rust files for class names
  • npm scripts: css:build, css:watch for development workflow
  • Justfile recipes: css-build, css-watch, ui-lib-build, frontend-lint
  • Atomic CSS generation (build-time optimization)
  • 301 utilities with safelist and shortcuts (ds-btn, ds-card, glass-effect)

Frontend Integration (vapora-frontend)

  • Migrated from local primitives to vapora-leptos-ui library
  • Removed duplicate component code (~200 lines)
  • Updated API compatibility (hover_effect → hoverable)
  • Re-export pattern in components/mod.rs for ergonomic imports
  • Pages updated: agents.rs, home.rs, projects.rs

Design System

  • Glassmorphism theme: Cyan/purple/pink gradients, backdrop blur, glow shadows
  • Type-safe variants: Compile-time validation prevents invalid combinations
  • Responsive: Mobile-first design with Tailwind-compatible utilities
  • Accessible: ARIA labels, keyboard navigation support

Added - Agent-to-Agent (A2A) Protocol & MCP Integration (v1.3.0)

MCP Server Implementation (vapora-mcp-server)

  • Real MCP (Model Context Protocol) transport layer with Stdio and SSE support
  • 6 integrated tools: kanban_create_task, kanban_update_task, get_project_summary, list_agents, get_agent_capabilities, assign_task_to_agent
  • Full JSON-RPC 2.0 protocol compliance
  • Backend client integration with authorization headers
  • Tool registry with JSON Schema validation for input parameters
  • Production-optimized release binary (6.5MB)

A2A Server Implementation (vapora-a2a crate)

  • Axum-based HTTP server with type-safe routing
  • Agent discovery endpoint: GET /.well-known/agent.json (AgentCard specification)
  • Task dispatch endpoint: POST /a2a (JSON-RPC 2.0 compliant)
  • Task status endpoint: GET /a2a/tasks/{task_id}
  • Health check endpoint: GET /health
  • Metrics endpoint: GET /metrics (Prometheus format)
  • Full task lifecycle management (waiting → working → completed/failed)
  • SurrealDB persistent storage with parameterized queries (tasks survive restarts)
  • NATS async coordination via background subscribers (TaskCompleted/TaskFailed events)
  • Prometheus metrics: task counts, durations, NATS messages, DB operations, coordinator assignments
  • CoordinatorBridge integration with AgentCoordinator using DashMap and oneshot channels
  • Comprehensive error handling with JSON-RPC error mapping and contextual logging
  • 5 integration tests (persistence, NATS completion, state transitions, failure handling, end-to-end)

A2A Client Library (vapora-a2a-client crate)

  • HTTP client wrapper for A2A protocol communication
  • Methods: discover_agent(), dispatch_task(), get_task_status(), health_check()
  • Configurable timeouts (default 30s) with automatic error detection
  • Exponential backoff retry policy with jitter (±20%) and smart error classification
  • Retry configuration: 3 retries, 100ms → 5s delay, 2.0x multiplier
  • Retries 5xx/network errors, skips 4xx/deserialization errors
  • Full serialization support for all A2A protocol types
  • Comprehensive error handling: HttpError, TaskNotFound, ServerError, ConnectionRefused, Timeout, InvalidResponse
  • 5 unit tests covering client creation, retry logic, and backoff behavior

Protocol Enhancements

  • Full bidirectional serialization for A2aTask, A2aTaskStatus, A2aTaskResult
  • JSON-RPC 2.0 request/response envelopes
  • A2aMessage with support for text and file parts
  • AgentCard with skills, capabilities, and authentication metadata
  • A2aErrorObj with JSON-RPC error code mapping

Kubernetes Integration (kubernetes/kagent/)

  • Production-ready manifests for kagent deployment
  • Kustomize-based configuration with dev/prod overlays
  • Development environment: 1 replica, debug logging, minimal resources
  • Production environment: 5 replicas, high availability, full resources
  • StatefulSet for ordered deployment with stable identities
  • Service definitions: Headless (coordination), API (REST), gRPC
  • RBAC configuration: ServiceAccount, ClusterRole, ResourceQuota
  • ConfigMap with A2A integration settings
  • Pod anti-affinity: Preferred (dev), Required (prod)
  • Health checks: Liveness (30s initial, 10s interval), Readiness (10s initial, 5s interval)
  • Comprehensive README with deployment guides

Code Quality

  • All Rust code compiled with cargo +nightly fmt for consistent formatting
  • Zero clippy warnings with strict -D warnings mode
  • 4/4 unit tests passing (100% pass rate)
  • Type-safe error handling throughout
  • Async/await patterns with no blocking I/O

Documentation

  • 3 Architecture Decision Records (ADRs):
    • ADR-0001: A2A Protocol Implementation
    • ADR-0002: Kubernetes Deployment Strategy
    • ADR-0003: Error Handling and JSON-RPC 2.0 Compliance
  • API specification in protocol modules
  • Kubernetes deployment guides with examples
  • ADR index and navigation

Workspace Updates

  • Added vapora-a2a-client to workspace members
  • Added vapora-a2a to workspace dependencies
  • Fixed comfy-table dependency in vapora-cli
  • Updated root Cargo.toml with new crates

Added - Tiered Risk-Based Approval Gates (v1.2.0)

  • Risk Classification Engine (200 LOC)

    • Rules-based algorithm with 4 weighted factors: Priority (30%), Keywords (40%), Expertise (20%), Feature scope (10%)
    • High-risk keywords: delete, production, security
    • Medium-risk keywords: deploy, api, schema
    • Risk scores: Low<0.4, Medium≥0.4, High≥0.7
    • 4 unit tests covering edge cases
  • Backend Approval Service (240 LOC)

    • CRUD operations: create, list, get, update, delete
    • Workflow methods: submit, approve, reject, mark_executed
    • Review management: add_review, list_reviews
    • Multi-tenant isolation via SurrealDB permissions
  • REST API Endpoints (250 LOC, 10 routes)

    • POST /api/v1/proposals - Create proposal
    • GET /api/v1/proposals?project_id=X&status=proposed - List with filters
    • GET /api/v1/proposals/:id - Get single proposal
    • PUT /api/v1/proposals/:id - Update proposal
    • DELETE /api/v1/proposals/:id - Delete proposal
    • PUT /api/v1/proposals/:id/submit - Submit for approval
    • PUT /api/v1/proposals/:id/approve - Approve
    • PUT /api/v1/proposals/:id/reject - Reject
    • PUT /api/v1/proposals/:id/executed - Mark executed
    • GET/POST /api/v1/proposals/:id/reviews - Review management
  • Database Schema (SurrealDB)

    • proposals table: 20 fields, 8 indexes, multi-tenant SCHEMAFULL
    • proposal_reviews table: 5 fields, 3 indexes
    • Proper constraints and SurrealDB permissions
  • NATS Integration

    • New message types: ProposalGenerated, ProposalApproved, ProposalRejected
    • Async coordination via pub/sub (subjects: vapora.proposals.generated|approved|rejected)
    • Non-blocking approval flow
  • Data Models (75 LOC in vapora-shared)

    • Proposal struct with task, agent, risk_level, plan_details, timestamps
    • ProposalStatus enum: Proposed | Approved | Rejected | Executed
    • RiskLevel enum: Low | Medium | High
    • PlanDetails with confidence, cost, resources, rollback strategy
    • ProposalReview for feedback tracking
  • Architecture Flow

    • Low-risk tasks execute immediately (no proposal)
    • Medium/high-risk tasks generate proposals for human review
    • Non-blocking: agents don't wait for approval (NATS pub/sub)
    • Learning integration ready: agent confidence feeds back to risk scoring

Added - CLI Arguments & Distribution (v1.2.0)

  • CLI Configuration: Command-line arguments for flexible deployment

    • --config <PATH> flag for custom configuration files
    • --help support on all binaries (vapora, vapora-backend, vapora-agents, vapora-mcp-server)
    • Environment variable overrides (VAPORA_CONFIG, BUDGET_CONFIG_PATH)
    • Example: vapora-backend --config /etc/vapora/backend.toml
  • Enhanced Distribution: Binary installation and cross-compilation target management

    • just distro::install — builds and installs server binaries to ~/.local/bin (or DIR=<path>)
    • just distro::install UI=true — additionally builds frontend via trunk --release
    • Cross-compilation: just distro::list-targets, just distro::install-targets, just distro::build-target TARGET
    • Binaries: vapora (CLI), vapora-backend (API), vapora-agents (orchestrator), vapora-mcp-server (gateway), vapora-a2a (A2A server)
  • Code Quality: Zero compiler warnings in vapora codebase

    • Systematic dead_code annotations for intentional scaffolding (Phase 3 workflow system)
    • Removed unused imports and variables
    • Maintained architecture integrity while suppressing false positives

Added - Workflow Orchestrator (v1.2.0)

  • Multi-Stage Workflow Engine: Complete orchestration system with short-lived agent contexts

    • vapora-workflow-engine crate (26 tests)
    • 95% cache token cost reduction (from $840/month to $110/month via context management)
    • Short-lived agent contexts prevent cache token accumulation
    • Artifact passing between stages (ADR, Code, TestResults, Review, Documentation)
    • Event-driven coordination via NATS pub/sub for stage progression
    • Approval gates for governance and quality control
    • State machine with validated transitions (Draft → Active → WaitingApproval → Completed/Failed)
  • Workflow Templates: 4 production-ready templates with stage definitions

    • feature_development (5 stages): architecture_design → implementation (2x parallel) → testing → code_review (approval) → deployment (approval)
    • bugfix (4 stages): investigation → fix_implementation → testing → deployment
    • documentation_update (3 stages): content_creation → review (approval) → publish
    • security_audit (4 stages): code_analysis → penetration_testing → remediation → verification (approval)
    • Configuration in config/workflows.toml with role assignments and agent limits
  • Kogral Integration: Filesystem-based knowledge enrichment

    • Automatic context enrichment from .kogral/ directory structure
    • Guidelines: .kogral/guidelines/{workflow_name}.md
    • Patterns: .kogral/patterns/*.md (all matching patterns)
    • ADRs: .kogral/adrs/*.md (5 most recent decisions)
    • Configurable via KOGRAL_PATH environment variable
    • Graceful fallback with warnings if knowledge files missing
    • Full async I/O with tokio::fs operations
  • CLI Commands: Complete workflow management from terminal

    • vapora-cli crate with 6 commands
    • start: Launch workflow from template with optional context file
    • list: Display all active workflows in formatted table
    • status: Get detailed workflow status with progress tracking
    • approve: Approve stage waiting for approval (with approver tracking)
    • cancel: Cancel running workflow with reason logging
    • templates: List available workflow templates
    • Colored terminal output with colored crate
    • UTF8 table formatting with comfy-table
    • HTTP client pattern (communicates with backend REST API)
    • Environment variable support: VAPORA_API_URL
  • Backend REST API: 6 workflow orchestration endpoints

    • POST /api/workflows/start - Start workflow from template
    • GET /api/workflows - List all workflows
    • GET /api/workflows/{id} - Get workflow status
    • POST /api/workflows/{id}/approve - Approve stage
    • POST /api/workflows/{id}/cancel - Cancel workflow
    • GET /api/workflows/templates - List templates
    • Full integration with SwarmCoordinator for agent task assignment
    • Real-time workflow state updates
    • WebSocket support for workflow progress streaming
  • Documentation: Comprehensive guides and decision records

    • ADR-0028: Workflow Orchestrator architecture decision (275 lines)
      • Root cause analysis: monolithic session pattern → 3.82B cache tokens
      • Cost projection: $840/month → $110/month (87% reduction)
      • Solution: short-lived agent contexts with artifact passing
      • Trade-offs and alternatives evaluation
    • workflow-orchestrator.md: Complete feature documentation (538 lines)
      • Architecture overview with component interaction diagrams
      • 4 workflow templates with stage breakdowns
      • REST API reference with request/response examples
      • Kogral integration details
      • Prometheus metrics reference
      • Troubleshooting guide
    • cli-commands.md: CLI reference manual (614 lines)
      • Installation instructions
      • Complete command reference with examples
      • Workflow template usage patterns
      • CI/CD integration examples
      • Error handling and recovery
    • overview.md: Updated with workflow orchestrator section
  • Cost Optimization: Real-world production savings

    • Before: Monolithic sessions accumulating 3.82B cache tokens/month
    • After: Short-lived contexts with 190M cache tokens/month
    • Savings: $730/month (95% reduction)
    • Per-role breakdown:
      • Architect: $120 → $6 (95% reduction)
      • Developer: $360 → $18 (95% reduction)
      • Reviewer: $240 → $12 (95% reduction)
      • Tester: $120 → $6 (95% reduction)
    • ROI: Infrastructure cost paid back in < 1 week

Added - Comprehensive Examples System

  • Comprehensive Examples System: 26+ executable examples demonstrating all VAPORA capabilities

    • Basic Examples (6): Foundation for each core crate
      • crates/vapora-agents/examples/01-simple-agent.rs - Agent registry & metadata
      • crates/vapora-llm-router/examples/01-provider-selection.rs - Multi-provider routing
      • crates/vapora-swarm/examples/01-agent-registration.rs - Swarm coordination basics
      • crates/vapora-knowledge-graph/examples/01-execution-tracking.rs - Temporal KG persistence
      • crates/vapora-backend/examples/01-health-check.rs - Backend verification
      • crates/vapora-shared/examples/01-error-handling.rs - Error type patterns
    • Intermediate Examples (9): System integration scenarios
      • Learning profiles with recency bias weighting
      • Budget enforcement with 3-tier fallback strategy
      • Cost tracking and ROI analysis per provider/task type
      • Swarm load distribution and capability-based filtering
      • Knowledge graph learning curves and similarity search
      • Full-stack agent + routing integration
      • Multi-agent swarm with expertise-based assignment
    • Advanced Examples (2): Complete end-to-end workflows
      • Full system integration (API → Swarm → Agents → Router → KG)
      • REST API integration with real-time WebSocket updates
    • Real-World Use Cases (3): Production scenarios with business value
      • Code review workflow: 3-stage pipeline with cost optimization ($488/month savings)
      • Documentation generation: Automated sync with quality checks ($989/month savings)
      • Issue triage: Intelligent classification with selective escalation ($997/month savings)
    • Interactive Notebooks (4): Marimo-based exploration
      • Agent basics with role configuration
      • Budget playground with cost projections
      • Learning curves visualization with confidence intervals
      • Cost analysis with provider comparison charts
  • Examples Documentation: 600+ line comprehensive guide

    • docs/examples-guide.md - Master reference for all examples
    • Example-by-example breakdown with learning objectives and run instructions
    • Three learning paths: Quick Overview (30min), System Integration (90min), Production Ready (2-3hrs)
    • Common tasks mapped to relevant examples
    • Business value analysis for real-world scenarios
    • Troubleshooting section and quick reference commands
  • Examples Organization:

    • Per-crate examples following crates/*/examples/ Cargo convention
    • Root-level examples in examples/full-stack/ and examples/real-world/
    • Master README catalog at examples/README.md with navigation
    • Python requirements for Marimo notebooks: examples/notebooks/requirements.txt
  • Web Assets Optimization: Restructured landing page with minification pipeline

    • Separated source (assets/web/src/index.html) from minified production version
    • Automated minification script (assets/web/minify.sh) for version synchronization
    • 32% compression achieved (26KB → 18KB)
    • Bilingual content (English/Spanish) preserved with localStorage persistence
    • Complete documentation in assets/web/README.md
  • Infrastructure & Build System

    • Just recipes for CI/CD automation (50+ recipes organized by category)
    • Parametrized help system for command discovery
    • Integration with development workflows

Changed

  • Code Quality Improvements

    • Removed unused imports from API and workflow modules (5+ files)
    • Fixed 6 unnecessary mut keyword warnings in provider analytics
    • Improved code patterns: converted verbose match to matches! macro (workflow/state.rs)
    • Applied automatic clippy fixes for idiomatic Rust
  • Documentation & Linting

    • Fixed markdown linting compliance in assets/web/README.md
    • Proper code fence language specifications (MD040)
    • Blank lines around code blocks (MD031)
    • Table formatting with compact style (MD060)

Fixed

  • Embeddings Provider Verification

    • Confirmed HuggingFace embeddings compile correctly (no errors)
    • All embedding provider tests passing (Ollama, OpenAI, HuggingFace)
    • vapora-llm-router: 53 tests passing (30 unit + 11 budget + 12 cost)
    • Factory function supports 3 providers: Ollama, OpenAI, HuggingFace
    • Models supported: BGE (small/base/large), MiniLM, MPNet, custom models
  • Compilation & Testing

    • Eliminated all unused import warnings in vapora-backend
    • Suppressed architectural dead code with appropriate attributes
    • All 55 tests passing in vapora-backend
    • 0 compilation errors, clean build output

Technical Details - Workflow Orchestrator

  • New Crates Created (2):

    • crates/vapora-workflow-engine/ - Core orchestration engine (2,431 lines)
      • src/orchestrator.rs (864 lines) - Workflow lifecycle management + Kogral integration
      • src/state.rs (321 lines) - State machine with validated transitions
      • src/template.rs (298 lines) - Template loading from TOML
      • src/artifact.rs (187 lines) - Inter-stage artifact serialization
      • src/events.rs (156 lines) - NATS event publishing/subscription
      • tests/ (26 tests) - Unit + integration tests
    • crates/vapora-cli/ - Command-line interface (671 lines)
      • src/main.rs - CLI entry point with clap
      • src/client.rs - HTTP client for backend API
      • src/commands.rs - Command definitions
      • src/output.rs - Terminal UI with colored tables
  • Modified Files (4):

    • crates/vapora-backend/src/api/workflow_orchestrator.rs (NEW) - REST API handlers
    • crates/vapora-backend/src/api/mod.rs - Route registration
    • crates/vapora-backend/src/api/state.rs - Orchestrator state injection
    • Cargo.toml - Workspace members + dependencies
  • Configuration Files (1):

    • config/workflows.toml - Workflow template definitions
      • 4 templates with stage configurations
      • Role assignments per stage
      • Agent limit configurations
      • Approval requirements
  • Test Suite:

    • Workflow Engine: 26 tests (state transitions, template loading, Kogral integration)
    • Backend Integration: 5 tests (REST API endpoints)
    • CLI: Manual testing (no automated tests yet)
    • Total new tests: 31
  • Build Status: Clean compilation

    • cargo build --workspace
    • cargo clippy --workspace -- -D warnings
    • cargo test -p vapora-workflow-engine (26/26 passing)
    • cargo test -p vapora-backend (55/55 passing)

Technical Details - General

  • Architecture: Refactored unused imports from workflow and API modules

    • Tests moved to test-only scope for AgentConfig/RegistryConfig types
    • Intentional suppression for components not yet integrated
    • Future-proof markers for architectural patterns
  • Build Status: Clean compilation pipeline

    • cargo build -p vapora-backend
    • cargo clippy -p vapora-backend (5 nesting suggestions only)
    • cargo test -p vapora-backend (55/55 passing)

1.2.0 - 2026-01-11

Added - Phase 5.3: Multi-Agent Learning

  • Learning Profiles: Per-task-type expertise tracking for each agent

    • LearningProfile struct with task-type expertise mapping
    • Success rate calculation with recency bias (7-day window weighted 3x)
    • Confidence scoring based on execution count (prevents small-sample overfitting)
    • Learning curve computation with exponential decay
  • Agent Scoring Service: Unified agent selection combining swarm metrics + learning

    • Formula: final_score = 0.3*base + 0.5*expertise + 0.2*confidence
    • Base score from SwarmCoordinator (load balancing)
    • Expertise score from learning profiles (historical success)
    • Confidence weighting dampens low-execution-count agents
  • Knowledge Graph Integration: Learning curve calculator

    • calculate_learning_curve() with time-series expertise evolution
    • apply_recency_bias() with exponential weighting formula
    • Aggregate by time windows (daily/weekly) for trend analysis
  • Coordinator Enhancement: Learning-based agent selection

    • Extract task type from description/role
    • Query learning profiles for task-specific expertise
    • Replace simple load balancing with learning-aware scoring
    • Background profile synchronization (30s interval)

Added - Phase 5.4: Cost Optimization

  • Budget Manager: Per-role cost enforcement

    • BudgetConfig with TOML serialization/deserialization
    • Role-specific monthly and weekly limits (in cents)
    • Automatic fallback provider when budget exceeded
    • Alert thresholds (default 80% utilization)
    • Weekly/monthly automatic resets
  • Configuration Loading: Graceful budget initialization

    • BudgetConfig::load() with strict validation
    • BudgetConfig::load_or_default() with fallback to empty config
    • Environment variable override: BUDGET_CONFIG_PATH
    • Validation: limits > 0, thresholds in [0.0, 1.0]
  • Cost-Aware Routing: Provider selection with budget constraints

    • Three-tier enforcement:
      1. Budget exceeded → force fallback provider
      2. Near threshold (>80%) → prefer cost-efficient providers
      3. Normal → rule-based routing with cost as tiebreaker
    • Cost efficiency ranking: (quality * 100) / (cost + 1)
    • Fallback chain ordering by cost (Ollama → Gemini → OpenAI → Claude)
  • Prometheus Metrics: Real-time cost and budget monitoring

    • vapora_llm_budget_remaining_cents{role} - Monthly budget remaining
    • vapora_llm_budget_utilization{role} - Budget usage fraction (0.0-1.0)
    • vapora_llm_fallback_triggered_total{role,reason} - Fallback event counter
    • vapora_llm_cost_per_provider_cents{provider} - Cumulative cost per provider
    • vapora_llm_tokens_per_provider{provider,type} - Token usage tracking
  • Grafana Dashboards: Visual monitoring

    • Budget utilization gauge (color thresholds: 70%, 90%, 100%)
    • Cost distribution pie chart (percentage per provider)
    • Fallback trigger time series (rate of fallback activations)
    • Agent assignment latency histogram (P50, P95, P99)
  • Alert Rules: Prometheus alerting

    • BudgetThresholdExceeded: Utilization > 80% for 5 minutes
    • HighFallbackRate: Rate > 0.1 for 10 minutes
    • CostAnomaly: Cost spike > 2x historical average
    • LearningProfilesInactive: No updates for 5 minutes

Added - Integration & Testing

  • End-to-End Integration Tests: Validate learning + budget interaction

    • test_end_to_end_learning_with_budget_enforcement() - Full system test
    • test_learning_selection_with_budget_constraints() - Budget pressure scenarios
    • test_learning_profile_improvement_with_budget_tracking() - Learning evolution
  • Agent Server Integration: Budget initialization at startup

    • Load budget configuration from config/agent-budgets.toml
    • Initialize BudgetManager with Arc for thread-safe sharing
    • Attach to coordinator via with_budget_manager() builder pattern
    • Graceful fallback if no configuration exists
  • Coordinator Builder Pattern: Budget manager attachment

    • Added budget_manager: Option<Arc<BudgetManager>> field
    • with_budget_manager() method for fluent API
    • Updated all constructors (new(), with_registry())
    • Backward compatible (works without budget configuration)

Added - Documentation

  • Implementation Summary: .coder/2026-01-11-phase-5-completion.done.md

    • Complete architecture overview (3-layer integration)
    • All files created/modified with line counts
    • Prometheus metrics reference
    • Quality metrics (120 tests passing)
    • Educational insights
  • Gradual Deployment Guide: guides/gradual-deployment-guide.md

    • Week 1: Staging validation (24 hours)
    • Week 2-3: Canary deployment (incremental traffic shift)
    • Week 4+: Production rollout (100% traffic)
    • Automated rollback procedures (< 5 minutes)
    • Success criteria per phase
    • Emergency procedures and checklists

Changed

  • LLMRouter: Enhanced with budget awareness

    • select_provider_with_budget() method for budget-aware routing
    • Fixed incomplete fallback implementation (lines 227-246)
    • Cost-ordered fallback chain (cheapest first)
  • ProfileAdapter: Learning integration

    • update_from_kg_learning() method for learning profile sync
    • Query KG for task-specific executions with recency filter
    • Calculate success rate with 7-day exponential decay
  • AgentCoordinator: Learning-based assignment

    • Replaced min-load selection with AgentScoringService
    • Extract task type from task description
    • Combine swarm metrics + learning profiles for final score

Fixed

  • Clippy Warnings: All resolved (0 warnings)

    • redundant_guards in BudgetConfig
    • needless_borrow in registry defaults
    • or_insert_withor_default() conversions
    • map_clonecloned() conversions
    • manual_div_ceildiv_ceil() method
  • Test Warnings: Unused variables marked with underscore prefix

Technical Details

New Files Created (13):

  • vapora-agents/src/learning_profile.rs (250 lines)
  • vapora-agents/src/scoring.rs (200 lines)
  • vapora-knowledge-graph/src/learning.rs (150 lines)
  • vapora-llm-router/src/budget.rs (300 lines)
  • vapora-llm-router/src/cost_ranker.rs (180 lines)
  • vapora-llm-router/src/cost_metrics.rs (120 lines)
  • config/agent-budgets.toml (50 lines)
  • vapora-agents/tests/end_to_end_learning_budget_test.rs (NEW)
  • 4+ integration test files (700+ lines total)

Modified Files (10):

  • vapora-agents/src/coordinator.rs - Learning integration
  • vapora-agents/src/profile_adapter.rs - KG sync
  • vapora-agents/src/bin/server.rs - Budget initialization
  • vapora-llm-router/src/router.rs - Cost-aware routing
  • vapora-llm-router/src/lib.rs - Budget exports
  • Plus 5 more lib.rs and config updates

Test Suite:

  • Total: 120 tests passing
  • Unit tests: 71 (vapora-agents: 41, vapora-llm-router: 30)
  • Integration tests: 42 (learning: 7, coordinator: 9, budget: 11, cost: 12, end-to-end: 3)
  • Quality checks: Zero warnings, clippy -D warnings passing

Deployment Readiness:

  • Staging validation checklist complete
  • Canary deployment Istio VirtualService configured
  • Grafana dashboards deployed
  • Alert rules created
  • Rollback automation ready (< 5 minutes)

0.1.0 - 2026-01-10

Added

  • Initial release with core platform features
  • Multi-agent orchestration with 12 specialized roles
  • Multi-IA router (Claude, OpenAI, Gemini, Ollama)
  • Kanban board UI with glassmorphism design
  • SurrealDB multi-tenant data layer
  • NATS JetStream agent coordination
  • Kubernetes-native deployment
  • Istio service mesh integration
  • MCP plugin system
  • RAG integration for semantic search
  • Cedar policy engine RBAC
  • Full-stack Rust implementation (Axum + Leptos)