Jesús Pérez b6a4d77421
Some checks are pending
Documentation Lint & Validation / Markdown Linting (push) Waiting to run
Documentation Lint & Validation / Validate mdBook Configuration (push) Waiting to run
Documentation Lint & Validation / Content & Structure Validation (push) Waiting to run
Documentation Lint & Validation / Lint & Validation Summary (push) Blocked by required conditions
mdBook Build & Deploy / Build mdBook (push) Waiting to run
mdBook Build & Deploy / Documentation Quality Check (push) Blocked by required conditions
mdBook Build & Deploy / Deploy to GitHub Pages (push) Blocked by required conditions
mdBook Build & Deploy / Notification (push) Blocked by required conditions
Rust CI / Security Audit (push) Waiting to run
Rust CI / Check + Test + Lint (nightly) (push) Waiting to run
Rust CI / Check + Test + Lint (stable) (push) Waiting to run
feat: add Leptos UI library and modularize MCP server
2026-02-14 20:10:55 +00:00

395 lines
9.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# VAPORA Grafana Dashboards
This directory contains 4 pre-configured Grafana dashboards for monitoring VAPORA.
## Dashboards
### 1. VAPORA Overview (`vapora-overview.json`)
**UID:** `vapora-overview`
**Panels:**
- Request Rate (req/sec)
- Error Rate (%)
- P95 Latency (ms)
- Request Rate by Endpoint (timeseries)
- Response Latency (P50, P95, P99) (timeseries)
- Response Status Distribution (pie chart)
- Database Operations (timeseries)
**Metrics Used:**
- `vapora_http_requests_total`
- `vapora_http_request_duration_seconds_bucket`
- `vapora_db_operations_total`
**Refresh:** 10 seconds
---
### 2. VAPORA Agent Metrics (`agent-metrics.json`)
**UID:** `vapora-agents`
**Panels:**
- Active Agents (count)
- Task Assignment Rate (assignments/sec)
- Task Failure Rate (%)
- Average Agent Load
- Task Execution Time by Agent Role (P50, P95, P99)
- Task Assignments by Skill (stacked)
- Agent Load Distribution (donut chart)
- Agent Expertise Scores (Learning Profiles)
- NATS Message Coordination (A2A)
**Metrics Used:**
- `vapora_swarm_agents_registered`
- `vapora_swarm_task_assignments_total`
- `vapora_swarm_agent_load`
- `vapora_agent_task_duration_seconds_bucket`
- `vapora_agent_expertise_score`
- `vapora_a2a_nats_messages_total`
**Refresh:** 10 seconds
---
### 3. VAPORA LLM Cost Tracking (`llm-cost-tracking.json`)
**UID:** `vapora-llm-cost`
**Panels:**
- Total LLM Cost (USD)
- Total Input Tokens
- Total Output Tokens
- Budget Usage % (gauge)
- Cost by Provider (timeseries)
- Token Usage by Provider (timeseries)
- Cost Distribution by Provider (donut chart)
- Cost Distribution by Role (donut chart)
- Request Distribution by Provider (donut chart)
- Hourly Budget Usage by Role (bars)
- Budget Status by Role (table)
**Metrics Used:**
- `vapora_llm_cost_total_cents`
- `vapora_llm_provider_token_usage`
- `vapora_llm_role_budget_used_cents`
- `vapora_llm_role_budget_limit_cents`
- `vapora_llm_provider_requests_total`
**Refresh:** 10 seconds
---
### 4. VAPORA Knowledge Graph Analytics (`knowledge-graph-analytics.json`)
**UID:** `vapora-kg-analytics`
**Panels:**
- Total Executions in KG
- KG Nodes
- KG Relationships
- Average Learning Curve Slope
- Learning Curves (Improvement Over Time)
- Average Execution Duration by Task Type
- Execution Count by Task Type (table)
- Execution Status Distribution (donut chart)
- Recency Bias Weights (7-day 3×, 30-day 1×)
- Similarity Searches (Hourly)
- Agent Success Rates by Task Type (table)
**Metrics Used:**
- `vapora_kg_total_executions`
- `vapora_kg_total_nodes`
- `vapora_kg_total_relationships`
- `vapora_kg_learning_curve_slope`
- `vapora_kg_learning_curve_improvement`
- `vapora_kg_execution_duration_seconds`
- `vapora_kg_executions_by_task_type`
- `vapora_kg_executions_by_status`
- `vapora_kg_recency_bias_weight`
- `vapora_kg_similarity_searches_total`
- `vapora_kg_agent_success_rate`
**Refresh:** 30 seconds
---
## Import Instructions
### Option 1: Grafana UI (Recommended)
1. **Access Grafana:**
```bash
kubectl port-forward -n observability svc/grafana 3000:3000
```
Open: http://localhost:3000
2. **Login:**
- Username: `admin`
- Password: `prom-operator` (or your configured password)
3. **Import Dashboards:**
- Click **"+"** → **"Import"** in the left sidebar
- Click **"Upload JSON file"** or **"Import via panel json"**
- Select one of the JSON files from this directory
- Select **Prometheus** as the datasource
- Click **"Import"**
4. **Repeat** for all 4 dashboards
### Option 2: Kubernetes ConfigMap (Automated)
Create a ConfigMap to auto-provision dashboards:
```bash
# Create ConfigMap for dashboards
kubectl create configmap vapora-dashboards \
--from-file=vapora-overview.json \
--from-file=agent-metrics.json \
--from-file=llm-cost-tracking.json \
--from-file=knowledge-graph-analytics.json \
-n observability
# Label for Grafana auto-discovery
kubectl label configmap vapora-dashboards \
grafana_dashboard=1 \
-n observability
```
**Note:** This assumes your Grafana instance is configured with a dashboard provider that watches for ConfigMaps with the `grafana_dashboard=1` label.
### Option 3: Direct File Mount (Docker/Local)
If running Grafana locally via Docker:
```bash
# Copy dashboards to Grafana provisioning directory
cp *.json /path/to/grafana/provisioning/dashboards/
# Restart Grafana
docker restart grafana
```
---
## Verification
After importing, verify dashboards are working:
1. **Check Prometheus Data Source:**
- Go to **Configuration** → **Data Sources**
- Verify **Prometheus** datasource exists and is reachable
- Test connection
2. **Check Metrics Availability:**
Open Prometheus UI:
```bash
kubectl port-forward -n observability svc/prometheus 9090:9090
```
Query test metrics:
- `vapora_http_requests_total`
- `vapora_agent_task_duration_seconds_bucket`
- `vapora_llm_cost_total_cents`
- `vapora_kg_total_executions`
3. **View Dashboards:**
- Go to **Dashboards** → **Browse**
- Look for "VAPORA" folder or tag
- Open each dashboard
- Verify panels show data (may take a few minutes after VAPORA starts)
---
## Customization
### Update Datasource
If your Prometheus datasource has a different name:
1. Open dashboard JSON file
2. Find all instances of `"uid": "${DS_PROMETHEUS}"`
3. Replace with your datasource UID
4. Re-import
### Adjust Refresh Rate
To change auto-refresh interval:
1. Open dashboard in Grafana
2. Click **Dashboard settings** (gear icon)
3. Go to **General** tab
4. Update **Refresh** dropdown
5. Click **Save dashboard**
### Add Custom Panels
To add new panels:
1. Edit dashboard
2. Click **"Add panel"** → **"Add a new panel"**
3. Select Prometheus datasource
4. Write PromQL query (see **Metrics Used** above for examples)
5. Configure visualization
6. Click **"Apply"**
7. Save dashboard
---
## Troubleshooting
### No Data Shown
**Problem:** Panels show "No data"
**Solutions:**
1. **Check VAPORA is running:**
```bash
kubectl get pods -n vapora
# All pods should be Running
```
2. **Check Prometheus is scraping VAPORA:**
```bash
kubectl port-forward -n observability svc/prometheus 9090:9090
```
Open: http://localhost:9090/targets
Look for `vapora-backend`, `vapora-a2a`, etc. targets
3. **Check metrics endpoint manually:**
```bash
kubectl port-forward -n vapora svc/vapora-backend 8001:8001
curl http://localhost:8001/metrics | grep vapora_
```
Should show Prometheus-format metrics
4. **Wait a few minutes** for metrics to accumulate
### Wrong Datasource
**Problem:** Dashboard shows "Data source not found"
**Solution:**
- Edit dashboard
- Click **Dashboard settings** → **Variables**
- Update `DS_PROMETHEUS` variable to match your datasource name
- Save
### Missing Metrics
**Problem:** Some panels show "No data" while others work
**Solution:**
- Check if specific VAPORA features are enabled:
- **Agent metrics:** Requires `vapora-agents` running
- **LLM cost:** Requires LLM provider configured
- **KG analytics:** Requires Knowledge Graph enabled
- Some metrics only appear after certain actions (e.g., task assignments, LLM calls)
---
## Dashboard Organization
Recommended Grafana folder structure:
```
📁 VAPORA/
├── 📊 Overview (vapora-overview)
├── 📊 Agent Metrics (vapora-agents)
├── 📊 LLM Cost Tracking (vapora-llm-cost)
└── 📊 Knowledge Graph Analytics (vapora-kg-analytics)
```
To create folder:
1. Go to **Dashboards** → **Browse**
2. Click **"New"** → **"New folder"**
3. Name: "VAPORA"
4. Move imported dashboards into this folder
---
## Alerting (Optional)
To set up alerts based on dashboard panels:
### Example: High Error Rate Alert
1. Open **VAPORA Overview** dashboard
2. Edit **"Error Rate"** panel
3. Go to **Alert** tab
4. Click **"Create alert rule from this panel"**
5. Configure:
- **Name:** "VAPORA High Error Rate"
- **Condition:** `avg() > 0.05` (5%)
- **For:** 5 minutes
- **Annotations:** "VAPORA error rate exceeded 5%"
6. Save
### Example: Budget Exceeded Alert
1. Open **VAPORA LLM Cost Tracking** dashboard
2. Edit **"Budget Usage %"** panel
3. Create alert:
- **Name:** "LLM Budget Near Limit"
- **Condition:** `last() > 0.9` (90%)
- **For:** 1 minute
- **Annotations:** "LLM budget usage exceeded 90%"
---
## Maintenance
### Update Dashboards
When VAPORA metrics change:
1. Export current dashboard JSON
2. Edit JSON file with new metrics
3. Increment version number
4. Re-import (overwrites existing)
### Backup Dashboards
```bash
# Export all VAPORA dashboards
curl -H "Authorization: Bearer $GRAFANA_API_KEY" \
"http://localhost:3000/api/dashboards/uid/vapora-overview" \
> vapora-overview-backup.json
# Repeat for other dashboard UIDs:
# - vapora-agents
# - vapora-llm-cost
# - vapora-kg-analytics
```
---
## Support
For dashboard issues:
- Check **VAPORA Metrics Documentation**: `docs/architecture/metrics.md`
- Check **Prometheus Setup**: `docs/operations/monitoring.md`
- Review **Grafana Docs**: https://grafana.com/docs/
For VAPORA metrics questions:
- See: `.claude/CLAUDE.md` → **Debugging & Monitoring** section
- Check: `crates/*/src/metrics.rs` files for metric definitions
---
**Last Updated:** 2026-02-08
**VAPORA Version:** 1.2.0
**Grafana Version:** 10.0+
**Prometheus Version:** 2.40+