VAPORA Grafana Dashboards
This directory contains 4 pre-configured Grafana dashboards for monitoring VAPORA.
Dashboards
1. VAPORA Overview (vapora-overview.json)
UID: vapora-overview
Panels:
- Request Rate (req/sec)
- Error Rate (%)
- P95 Latency (ms)
- Request Rate by Endpoint (timeseries)
- Response Latency (P50, P95, P99) (timeseries)
- Response Status Distribution (pie chart)
- Database Operations (timeseries)
Metrics Used:
vapora_http_requests_totalvapora_http_request_duration_seconds_bucketvapora_db_operations_total
Refresh: 10 seconds
2. VAPORA Agent Metrics (agent-metrics.json)
UID: vapora-agents
Panels:
- Active Agents (count)
- Task Assignment Rate (assignments/sec)
- Task Failure Rate (%)
- Average Agent Load
- Task Execution Time by Agent Role (P50, P95, P99)
- Task Assignments by Skill (stacked)
- Agent Load Distribution (donut chart)
- Agent Expertise Scores (Learning Profiles)
- NATS Message Coordination (A2A)
Metrics Used:
vapora_swarm_agents_registeredvapora_swarm_task_assignments_totalvapora_swarm_agent_loadvapora_agent_task_duration_seconds_bucketvapora_agent_expertise_scorevapora_a2a_nats_messages_total
Refresh: 10 seconds
3. VAPORA LLM Cost Tracking (llm-cost-tracking.json)
UID: vapora-llm-cost
Panels:
- Total LLM Cost (USD)
- Total Input Tokens
- Total Output Tokens
- Budget Usage % (gauge)
- Cost by Provider (timeseries)
- Token Usage by Provider (timeseries)
- Cost Distribution by Provider (donut chart)
- Cost Distribution by Role (donut chart)
- Request Distribution by Provider (donut chart)
- Hourly Budget Usage by Role (bars)
- Budget Status by Role (table)
Metrics Used:
vapora_llm_cost_total_centsvapora_llm_provider_token_usagevapora_llm_role_budget_used_centsvapora_llm_role_budget_limit_centsvapora_llm_provider_requests_total
Refresh: 10 seconds
4. VAPORA Knowledge Graph Analytics (knowledge-graph-analytics.json)
UID: vapora-kg-analytics
Panels:
- Total Executions in KG
- KG Nodes
- KG Relationships
- Average Learning Curve Slope
- Learning Curves (Improvement Over Time)
- Average Execution Duration by Task Type
- Execution Count by Task Type (table)
- Execution Status Distribution (donut chart)
- Recency Bias Weights (7-day 3×, 30-day 1×)
- Similarity Searches (Hourly)
- Agent Success Rates by Task Type (table)
Metrics Used:
vapora_kg_total_executionsvapora_kg_total_nodesvapora_kg_total_relationshipsvapora_kg_learning_curve_slopevapora_kg_learning_curve_improvementvapora_kg_execution_duration_secondsvapora_kg_executions_by_task_typevapora_kg_executions_by_statusvapora_kg_recency_bias_weightvapora_kg_similarity_searches_totalvapora_kg_agent_success_rate
Refresh: 30 seconds
Import Instructions
Option 1: Grafana UI (Recommended)
-
Access Grafana:
kubectl port-forward -n observability svc/grafana 3000:3000Open: http://localhost:3000
-
Login:
- Username:
admin - Password:
prom-operator(or your configured password)
- Username:
-
Import Dashboards:
- Click "+" → "Import" in the left sidebar
- Click "Upload JSON file" or "Import via panel json"
- Select one of the JSON files from this directory
- Select Prometheus as the datasource
- Click "Import"
-
Repeat for all 4 dashboards
Option 2: Kubernetes ConfigMap (Automated)
Create a ConfigMap to auto-provision dashboards:
# Create ConfigMap for dashboards
kubectl create configmap vapora-dashboards \
--from-file=vapora-overview.json \
--from-file=agent-metrics.json \
--from-file=llm-cost-tracking.json \
--from-file=knowledge-graph-analytics.json \
-n observability
# Label for Grafana auto-discovery
kubectl label configmap vapora-dashboards \
grafana_dashboard=1 \
-n observability
Note: This assumes your Grafana instance is configured with a dashboard provider that watches for ConfigMaps with the grafana_dashboard=1 label.
Option 3: Direct File Mount (Docker/Local)
If running Grafana locally via Docker:
# Copy dashboards to Grafana provisioning directory
cp *.json /path/to/grafana/provisioning/dashboards/
# Restart Grafana
docker restart grafana
Verification
After importing, verify dashboards are working:
-
Check Prometheus Data Source:
- Go to Configuration → Data Sources
- Verify Prometheus datasource exists and is reachable
- Test connection
-
Check Metrics Availability:
Open Prometheus UI:
kubectl port-forward -n observability svc/prometheus 9090:9090Query test metrics:
vapora_http_requests_totalvapora_agent_task_duration_seconds_bucketvapora_llm_cost_total_centsvapora_kg_total_executions
-
View Dashboards:
- Go to Dashboards → Browse
- Look for "VAPORA" folder or tag
- Open each dashboard
- Verify panels show data (may take a few minutes after VAPORA starts)
Customization
Update Datasource
If your Prometheus datasource has a different name:
- Open dashboard JSON file
- Find all instances of
"uid": "${DS_PROMETHEUS}" - Replace with your datasource UID
- Re-import
Adjust Refresh Rate
To change auto-refresh interval:
- Open dashboard in Grafana
- Click Dashboard settings (gear icon)
- Go to General tab
- Update Refresh dropdown
- Click Save dashboard
Add Custom Panels
To add new panels:
- Edit dashboard
- Click "Add panel" → "Add a new panel"
- Select Prometheus datasource
- Write PromQL query (see Metrics Used above for examples)
- Configure visualization
- Click "Apply"
- Save dashboard
Troubleshooting
No Data Shown
Problem: Panels show "No data"
Solutions:
-
Check VAPORA is running:
kubectl get pods -n vapora # All pods should be Running -
Check Prometheus is scraping VAPORA:
kubectl port-forward -n observability svc/prometheus 9090:9090Open: http://localhost:9090/targets
Look for
vapora-backend,vapora-a2a, etc. targets -
Check metrics endpoint manually:
kubectl port-forward -n vapora svc/vapora-backend 8001:8001 curl http://localhost:8001/metrics | grep vapora_Should show Prometheus-format metrics
-
Wait a few minutes for metrics to accumulate
Wrong Datasource
Problem: Dashboard shows "Data source not found"
Solution:
- Edit dashboard
- Click Dashboard settings → Variables
- Update
DS_PROMETHEUSvariable to match your datasource name - Save
Missing Metrics
Problem: Some panels show "No data" while others work
Solution:
- Check if specific VAPORA features are enabled:
- Agent metrics: Requires
vapora-agentsrunning - LLM cost: Requires LLM provider configured
- KG analytics: Requires Knowledge Graph enabled
- Agent metrics: Requires
- Some metrics only appear after certain actions (e.g., task assignments, LLM calls)
Dashboard Organization
Recommended Grafana folder structure:
📁 VAPORA/
├── 📊 Overview (vapora-overview)
├── 📊 Agent Metrics (vapora-agents)
├── 📊 LLM Cost Tracking (vapora-llm-cost)
└── 📊 Knowledge Graph Analytics (vapora-kg-analytics)
To create folder:
- Go to Dashboards → Browse
- Click "New" → "New folder"
- Name: "VAPORA"
- Move imported dashboards into this folder
Alerting (Optional)
To set up alerts based on dashboard panels:
Example: High Error Rate Alert
- Open VAPORA Overview dashboard
- Edit "Error Rate" panel
- Go to Alert tab
- Click "Create alert rule from this panel"
- Configure:
- Name: "VAPORA High Error Rate"
- Condition:
avg() > 0.05(5%) - For: 5 minutes
- Annotations: "VAPORA error rate exceeded 5%"
- Save
Example: Budget Exceeded Alert
- Open VAPORA LLM Cost Tracking dashboard
- Edit "Budget Usage %" panel
- Create alert:
- Name: "LLM Budget Near Limit"
- Condition:
last() > 0.9(90%) - For: 1 minute
- Annotations: "LLM budget usage exceeded 90%"
Maintenance
Update Dashboards
When VAPORA metrics change:
- Export current dashboard JSON
- Edit JSON file with new metrics
- Increment version number
- Re-import (overwrites existing)
Backup Dashboards
# Export all VAPORA dashboards
curl -H "Authorization: Bearer $GRAFANA_API_KEY" \
"http://localhost:3000/api/dashboards/uid/vapora-overview" \
> vapora-overview-backup.json
# Repeat for other dashboard UIDs:
# - vapora-agents
# - vapora-llm-cost
# - vapora-kg-analytics
Support
For dashboard issues:
- Check VAPORA Metrics Documentation:
docs/architecture/metrics.md - Check Prometheus Setup:
docs/operations/monitoring.md - Review Grafana Docs: https://grafana.com/docs/
For VAPORA metrics questions:
- See:
.claude/CLAUDE.md→ Debugging & Monitoring section - Check:
crates/*/src/metrics.rsfiles for metric definitions
Last Updated: 2026-02-08 VAPORA Version: 1.2.0 Grafana Version: 10.0+ Prometheus Version: 2.40+