Vapora/docs/adrs/0013-knowledge-graph.md
Jesús Pérez 7110ffeea2
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
chore: extend doc: adr, tutorials, operations, etc
2026-01-12 03:32:47 +00:00

272 lines
7.5 KiB
Markdown

# ADR-013: Knowledge Graph Temporal con SurrealDB
**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: Architecture Team
**Technical Story**: Enabling collective agent learning through temporal execution history
---
## Decision
Implementar **Knowledge Graph temporal** en SurrealDB con historia de ejecución, curvas de aprendizaje, y búsqueda de similaridad.
---
## Rationale
1. **Collective Learning**: Agentes aprenden de experiencia compartida (no solo individual)
2. **Temporal History**: Histórico de 30/90 días permite identificar tendencias
3. **Causal Relationships**: Graph permite rastrear raíces de problemas y soluciones
4. **Similarity Search**: Encontrar soluciones pasadas para tareas similares
5. **SurrealDB Native**: Graph queries integradas en mismo DB que relacional
---
## Alternatives Considered
### ❌ Event Log Only (No Graph)
- **Pros**: Simple
- **Cons**: Sin relaciones causales, búsqueda ineficiente
### ❌ Separate Graph DB (Neo4j)
- **Pros**: Optimizado para graph
- **Cons**: Duplicación de datos, sincronización complexity
### ✅ SurrealDB Temporal KG (CHOSEN)
- Unificado, temporal, graph queries integradas
---
## Trade-offs
**Pros**:
- ✅ Temporal data (30/90 day retention)
- ✅ Causal relationships traceable
- ✅ Similarity search for solution discovery
- ✅ Learning curves identify improvement trends
- ✅ Single database (no sync issues)
**Cons**:
- ⚠️ Graph queries more complex than relational
- ⚠️ Storage overhead for full history
- ⚠️ Retention policy trade-off: longer history = more storage
---
## Implementation
**Temporal Data Model**:
```rust
// crates/vapora-knowledge-graph/src/models.rs
pub struct ExecutionRecord {
pub id: String,
pub agent_id: String,
pub task_id: String,
pub task_type: String,
pub success: bool,
pub quality_score: f32,
pub latency_ms: u32,
pub cost_cents: u32,
pub timestamp: DateTime<Utc>,
pub daily_window: String, // YYYY-MM-DD for aggregation
}
pub struct LearningCurve {
pub id: String,
pub agent_id: String,
pub task_type: String,
pub day: String, // YYYY-MM-DD
pub success_rate: f32,
pub avg_quality: f32,
pub trend: TrendDirection, // Improving, Stable, Declining
}
```
**SurrealDB Schema**:
```surql
-- Define execution records table
DEFINE TABLE executions;
DEFINE FIELD agent_id ON TABLE executions TYPE string;
DEFINE FIELD task_id ON TABLE executions TYPE string;
DEFINE FIELD task_type ON TABLE executions TYPE string;
DEFINE FIELD success ON TABLE executions TYPE boolean;
DEFINE FIELD quality_score ON TABLE executions TYPE float;
DEFINE FIELD timestamp ON TABLE executions TYPE datetime;
DEFINE FIELD daily_window ON TABLE executions TYPE string;
-- Define temporal index for efficient time-range queries
DEFINE INDEX idx_execution_temporal ON TABLE executions
COLUMNS timestamp, daily_window;
-- Define learning curves table
DEFINE TABLE learning_curves;
DEFINE FIELD agent_id ON TABLE learning_curves TYPE string;
DEFINE FIELD task_type ON TABLE learning_curves TYPE string;
DEFINE FIELD day ON TABLE learning_curves TYPE string;
DEFINE FIELD success_rate ON TABLE learning_curves TYPE float;
DEFINE FIELD trend ON TABLE learning_curves TYPE string;
```
**Temporal Query (30-Day Learning Curve)**:
```rust
// crates/vapora-knowledge-graph/src/learning.rs
pub async fn compute_learning_curve(
db: &Surreal<Ws>,
agent_id: &str,
task_type: &str,
days: u32,
) -> Result<Vec<LearningCurve>> {
let since = (Utc::now() - Duration::days(days as i64))
.format("%Y-%m-%d")
.to_string();
let query = format!(
r#"
SELECT
day,
count(id) as total_tasks,
count(id WHERE success = true) / count(id) as success_rate,
avg(quality_score) as avg_quality,
(avg(quality_score) - LAG(avg(quality_score)) OVER (ORDER BY day)) as trend
FROM executions
WHERE agent_id = {} AND task_type = {} AND daily_window >= {}
GROUP BY daily_window
ORDER BY daily_window ASC
"#,
agent_id, task_type, since
);
db.query(query).await?
.take::<Vec<LearningCurve>>(0)?
.ok_or(Error::NotFound)
}
```
**Similarity Search (Find Past Solutions)**:
```rust
pub async fn find_similar_tasks(
db: &Surreal<Ws>,
task: &Task,
limit: u32,
) -> Result<Vec<(ExecutionRecord, f32)>> {
// Compute embedding similarity for task description
let similarity_threshold = 0.85;
let query = r#"
SELECT
executions.*,
<similarity_score> as score
FROM executions
WHERE similarity_score > {} AND success = true
ORDER BY similarity_score DESC
LIMIT {}
"#;
db.query(query)
.bind(("similarity_score", similarity_threshold))
.bind(("limit", limit))
.await?
.take::<Vec<(ExecutionRecord, f32)>>(0)?
.ok_or(Error::NotFound)
}
```
**Causal Graph (Problem Resolution)**:
```rust
pub async fn trace_solution_chain(
db: &Surreal<Ws>,
problem_task_id: &str,
) -> Result<Vec<ExecutionRecord>> {
let query = format!(
r#"
SELECT
->(resolved_by)->executions AS solutions
FROM tasks
WHERE id = {}
"#,
problem_task_id
);
db.query(query)
.await?
.take::<Vec<ExecutionRecord>>(0)?
.ok_or(Error::NotFound)
}
```
**Key Files**:
- `/crates/vapora-knowledge-graph/src/learning.rs` (learning curve computation)
- `/crates/vapora-knowledge-graph/src/persistence.rs` (DB persistence)
- `/crates/vapora-knowledge-graph/src/models.rs` (temporal models)
- `/crates/vapora-backend/src/services/` (uses KG for task recommendations)
---
## Verification
```bash
# Test learning curve computation
cargo test -p vapora-knowledge-graph test_learning_curve_30day
# Test similarity search
cargo test -p vapora-knowledge-graph test_similarity_search
# Test causal graph traversal
cargo test -p vapora-knowledge-graph test_causal_chain
# Test retention policy (30-day window)
cargo test -p vapora-knowledge-graph test_retention_policy
# Integration test: full KG workflow
cargo test -p vapora-knowledge-graph test_full_kg_lifecycle
# Query performance test
cargo bench -p vapora-knowledge-graph bench_temporal_queries
```
**Expected Output**:
- Learning curves computed correctly
- Similarity search finds relevant past executions
- Causal chains traceable
- Retention policy removes old records
- Temporal queries perform well (<100ms)
---
## Consequences
### Data Management
- Storage grows ~1MB per 1000 executions (depends on detail level)
- Retention policy: 30 days (users), 90 days (enterprise)
- Archival strategy for historical analysis
### Agent Learning
- Agents access KG to find similar past solutions
- Learning curves inform agent selection (see ADR-014)
- Improvement trends visible for monitoring
### Observability
- Full audit trail of agent decisions
- Trending analysis for capacity planning
- Incident investigation via causal chains
### Scalability
- Graph queries optimized with indexes
- Temporal queries use daily windows (efficient partition)
- Similarity search scales to millions of records
---
## References
- `/crates/vapora-knowledge-graph/src/learning.rs` (implementation)
- `/crates/vapora-knowledge-graph/src/persistence.rs` (persistence layer)
- ADR-004 (SurrealDB)
- ADR-014 (Learning Profiles)
- ADR-019 (Temporal Execution History)
---
**Related ADRs**: ADR-004 (SurrealDB), ADR-014 (Learning Profiles), ADR-019 (Temporal History)