# VAPORA Grafana Dashboards This directory contains 4 pre-configured Grafana dashboards for monitoring VAPORA. ## Dashboards ### 1. VAPORA Overview (`vapora-overview.json`) **UID:** `vapora-overview` **Panels:** - Request Rate (req/sec) - Error Rate (%) - P95 Latency (ms) - Request Rate by Endpoint (timeseries) - Response Latency (P50, P95, P99) (timeseries) - Response Status Distribution (pie chart) - Database Operations (timeseries) **Metrics Used:** - `vapora_http_requests_total` - `vapora_http_request_duration_seconds_bucket` - `vapora_db_operations_total` **Refresh:** 10 seconds --- ### 2. VAPORA Agent Metrics (`agent-metrics.json`) **UID:** `vapora-agents` **Panels:** - Active Agents (count) - Task Assignment Rate (assignments/sec) - Task Failure Rate (%) - Average Agent Load - Task Execution Time by Agent Role (P50, P95, P99) - Task Assignments by Skill (stacked) - Agent Load Distribution (donut chart) - Agent Expertise Scores (Learning Profiles) - NATS Message Coordination (A2A) **Metrics Used:** - `vapora_swarm_agents_registered` - `vapora_swarm_task_assignments_total` - `vapora_swarm_agent_load` - `vapora_agent_task_duration_seconds_bucket` - `vapora_agent_expertise_score` - `vapora_a2a_nats_messages_total` **Refresh:** 10 seconds --- ### 3. VAPORA LLM Cost Tracking (`llm-cost-tracking.json`) **UID:** `vapora-llm-cost` **Panels:** - Total LLM Cost (USD) - Total Input Tokens - Total Output Tokens - Budget Usage % (gauge) - Cost by Provider (timeseries) - Token Usage by Provider (timeseries) - Cost Distribution by Provider (donut chart) - Cost Distribution by Role (donut chart) - Request Distribution by Provider (donut chart) - Hourly Budget Usage by Role (bars) - Budget Status by Role (table) **Metrics Used:** - `vapora_llm_cost_total_cents` - `vapora_llm_provider_token_usage` - `vapora_llm_role_budget_used_cents` - `vapora_llm_role_budget_limit_cents` - `vapora_llm_provider_requests_total` **Refresh:** 10 seconds --- ### 4. VAPORA Knowledge Graph Analytics (`knowledge-graph-analytics.json`) **UID:** `vapora-kg-analytics` **Panels:** - Total Executions in KG - KG Nodes - KG Relationships - Average Learning Curve Slope - Learning Curves (Improvement Over Time) - Average Execution Duration by Task Type - Execution Count by Task Type (table) - Execution Status Distribution (donut chart) - Recency Bias Weights (7-day 3×, 30-day 1×) - Similarity Searches (Hourly) - Agent Success Rates by Task Type (table) **Metrics Used:** - `vapora_kg_total_executions` - `vapora_kg_total_nodes` - `vapora_kg_total_relationships` - `vapora_kg_learning_curve_slope` - `vapora_kg_learning_curve_improvement` - `vapora_kg_execution_duration_seconds` - `vapora_kg_executions_by_task_type` - `vapora_kg_executions_by_status` - `vapora_kg_recency_bias_weight` - `vapora_kg_similarity_searches_total` - `vapora_kg_agent_success_rate` **Refresh:** 30 seconds --- ## Import Instructions ### Option 1: Grafana UI (Recommended) 1. **Access Grafana:** ```bash kubectl port-forward -n observability svc/grafana 3000:3000 ``` Open: http://localhost:3000 2. **Login:** - Username: `admin` - Password: `prom-operator` (or your configured password) 3. **Import Dashboards:** - Click **"+"** → **"Import"** in the left sidebar - Click **"Upload JSON file"** or **"Import via panel json"** - Select one of the JSON files from this directory - Select **Prometheus** as the datasource - Click **"Import"** 4. **Repeat** for all 4 dashboards ### Option 2: Kubernetes ConfigMap (Automated) Create a ConfigMap to auto-provision dashboards: ```bash # Create ConfigMap for dashboards kubectl create configmap vapora-dashboards \ --from-file=vapora-overview.json \ --from-file=agent-metrics.json \ --from-file=llm-cost-tracking.json \ --from-file=knowledge-graph-analytics.json \ -n observability # Label for Grafana auto-discovery kubectl label configmap vapora-dashboards \ grafana_dashboard=1 \ -n observability ``` **Note:** This assumes your Grafana instance is configured with a dashboard provider that watches for ConfigMaps with the `grafana_dashboard=1` label. ### Option 3: Direct File Mount (Docker/Local) If running Grafana locally via Docker: ```bash # Copy dashboards to Grafana provisioning directory cp *.json /path/to/grafana/provisioning/dashboards/ # Restart Grafana docker restart grafana ``` --- ## Verification After importing, verify dashboards are working: 1. **Check Prometheus Data Source:** - Go to **Configuration** → **Data Sources** - Verify **Prometheus** datasource exists and is reachable - Test connection 2. **Check Metrics Availability:** Open Prometheus UI: ```bash kubectl port-forward -n observability svc/prometheus 9090:9090 ``` Query test metrics: - `vapora_http_requests_total` - `vapora_agent_task_duration_seconds_bucket` - `vapora_llm_cost_total_cents` - `vapora_kg_total_executions` 3. **View Dashboards:** - Go to **Dashboards** → **Browse** - Look for "VAPORA" folder or tag - Open each dashboard - Verify panels show data (may take a few minutes after VAPORA starts) --- ## Customization ### Update Datasource If your Prometheus datasource has a different name: 1. Open dashboard JSON file 2. Find all instances of `"uid": "${DS_PROMETHEUS}"` 3. Replace with your datasource UID 4. Re-import ### Adjust Refresh Rate To change auto-refresh interval: 1. Open dashboard in Grafana 2. Click **Dashboard settings** (gear icon) 3. Go to **General** tab 4. Update **Refresh** dropdown 5. Click **Save dashboard** ### Add Custom Panels To add new panels: 1. Edit dashboard 2. Click **"Add panel"** → **"Add a new panel"** 3. Select Prometheus datasource 4. Write PromQL query (see **Metrics Used** above for examples) 5. Configure visualization 6. Click **"Apply"** 7. Save dashboard --- ## Troubleshooting ### No Data Shown **Problem:** Panels show "No data" **Solutions:** 1. **Check VAPORA is running:** ```bash kubectl get pods -n vapora # All pods should be Running ``` 2. **Check Prometheus is scraping VAPORA:** ```bash kubectl port-forward -n observability svc/prometheus 9090:9090 ``` Open: http://localhost:9090/targets Look for `vapora-backend`, `vapora-a2a`, etc. targets 3. **Check metrics endpoint manually:** ```bash kubectl port-forward -n vapora svc/vapora-backend 8001:8001 curl http://localhost:8001/metrics | grep vapora_ ``` Should show Prometheus-format metrics 4. **Wait a few minutes** for metrics to accumulate ### Wrong Datasource **Problem:** Dashboard shows "Data source not found" **Solution:** - Edit dashboard - Click **Dashboard settings** → **Variables** - Update `DS_PROMETHEUS` variable to match your datasource name - Save ### Missing Metrics **Problem:** Some panels show "No data" while others work **Solution:** - Check if specific VAPORA features are enabled: - **Agent metrics:** Requires `vapora-agents` running - **LLM cost:** Requires LLM provider configured - **KG analytics:** Requires Knowledge Graph enabled - Some metrics only appear after certain actions (e.g., task assignments, LLM calls) --- ## Dashboard Organization Recommended Grafana folder structure: ``` 📁 VAPORA/ ├── 📊 Overview (vapora-overview) ├── 📊 Agent Metrics (vapora-agents) ├── 📊 LLM Cost Tracking (vapora-llm-cost) └── 📊 Knowledge Graph Analytics (vapora-kg-analytics) ``` To create folder: 1. Go to **Dashboards** → **Browse** 2. Click **"New"** → **"New folder"** 3. Name: "VAPORA" 4. Move imported dashboards into this folder --- ## Alerting (Optional) To set up alerts based on dashboard panels: ### Example: High Error Rate Alert 1. Open **VAPORA Overview** dashboard 2. Edit **"Error Rate"** panel 3. Go to **Alert** tab 4. Click **"Create alert rule from this panel"** 5. Configure: - **Name:** "VAPORA High Error Rate" - **Condition:** `avg() > 0.05` (5%) - **For:** 5 minutes - **Annotations:** "VAPORA error rate exceeded 5%" 6. Save ### Example: Budget Exceeded Alert 1. Open **VAPORA LLM Cost Tracking** dashboard 2. Edit **"Budget Usage %"** panel 3. Create alert: - **Name:** "LLM Budget Near Limit" - **Condition:** `last() > 0.9` (90%) - **For:** 1 minute - **Annotations:** "LLM budget usage exceeded 90%" --- ## Maintenance ### Update Dashboards When VAPORA metrics change: 1. Export current dashboard JSON 2. Edit JSON file with new metrics 3. Increment version number 4. Re-import (overwrites existing) ### Backup Dashboards ```bash # Export all VAPORA dashboards curl -H "Authorization: Bearer $GRAFANA_API_KEY" \ "http://localhost:3000/api/dashboards/uid/vapora-overview" \ > vapora-overview-backup.json # Repeat for other dashboard UIDs: # - vapora-agents # - vapora-llm-cost # - vapora-kg-analytics ``` --- ## Support For dashboard issues: - Check **VAPORA Metrics Documentation**: `docs/architecture/metrics.md` - Check **Prometheus Setup**: `docs/operations/monitoring.md` - Review **Grafana Docs**: https://grafana.com/docs/ For VAPORA metrics questions: - See: `.claude/CLAUDE.md` → **Debugging & Monitoring** section - Check: `crates/*/src/metrics.rs` files for metric definitions --- **Last Updated:** 2026-02-08 **VAPORA Version:** 1.2.0 **Grafana Version:** 10.0+ **Prometheus Version:** 2.40+