272 lines
7.5 KiB
Markdown
272 lines
7.5 KiB
Markdown
|
|
# ADR-013: Knowledge Graph Temporal con SurrealDB
|
||
|
|
|
||
|
|
**Status**: Accepted | Implemented
|
||
|
|
**Date**: 2024-11-01
|
||
|
|
**Deciders**: Architecture Team
|
||
|
|
**Technical Story**: Enabling collective agent learning through temporal execution history
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Decision
|
||
|
|
|
||
|
|
Implementar **Knowledge Graph temporal** en SurrealDB con historia de ejecución, curvas de aprendizaje, y búsqueda de similaridad.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Rationale
|
||
|
|
|
||
|
|
1. **Collective Learning**: Agentes aprenden de experiencia compartida (no solo individual)
|
||
|
|
2. **Temporal History**: Histórico de 30/90 días permite identificar tendencias
|
||
|
|
3. **Causal Relationships**: Graph permite rastrear raíces de problemas y soluciones
|
||
|
|
4. **Similarity Search**: Encontrar soluciones pasadas para tareas similares
|
||
|
|
5. **SurrealDB Native**: Graph queries integradas en mismo DB que relacional
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Alternatives Considered
|
||
|
|
|
||
|
|
### ❌ Event Log Only (No Graph)
|
||
|
|
- **Pros**: Simple
|
||
|
|
- **Cons**: Sin relaciones causales, búsqueda ineficiente
|
||
|
|
|
||
|
|
### ❌ Separate Graph DB (Neo4j)
|
||
|
|
- **Pros**: Optimizado para graph
|
||
|
|
- **Cons**: Duplicación de datos, sincronización complexity
|
||
|
|
|
||
|
|
### ✅ SurrealDB Temporal KG (CHOSEN)
|
||
|
|
- Unificado, temporal, graph queries integradas
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Trade-offs
|
||
|
|
|
||
|
|
**Pros**:
|
||
|
|
- ✅ Temporal data (30/90 day retention)
|
||
|
|
- ✅ Causal relationships traceable
|
||
|
|
- ✅ Similarity search for solution discovery
|
||
|
|
- ✅ Learning curves identify improvement trends
|
||
|
|
- ✅ Single database (no sync issues)
|
||
|
|
|
||
|
|
**Cons**:
|
||
|
|
- ⚠️ Graph queries more complex than relational
|
||
|
|
- ⚠️ Storage overhead for full history
|
||
|
|
- ⚠️ Retention policy trade-off: longer history = more storage
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Implementation
|
||
|
|
|
||
|
|
**Temporal Data Model**:
|
||
|
|
```rust
|
||
|
|
// crates/vapora-knowledge-graph/src/models.rs
|
||
|
|
pub struct ExecutionRecord {
|
||
|
|
pub id: String,
|
||
|
|
pub agent_id: String,
|
||
|
|
pub task_id: String,
|
||
|
|
pub task_type: String,
|
||
|
|
pub success: bool,
|
||
|
|
pub quality_score: f32,
|
||
|
|
pub latency_ms: u32,
|
||
|
|
pub cost_cents: u32,
|
||
|
|
pub timestamp: DateTime<Utc>,
|
||
|
|
pub daily_window: String, // YYYY-MM-DD for aggregation
|
||
|
|
}
|
||
|
|
|
||
|
|
pub struct LearningCurve {
|
||
|
|
pub id: String,
|
||
|
|
pub agent_id: String,
|
||
|
|
pub task_type: String,
|
||
|
|
pub day: String, // YYYY-MM-DD
|
||
|
|
pub success_rate: f32,
|
||
|
|
pub avg_quality: f32,
|
||
|
|
pub trend: TrendDirection, // Improving, Stable, Declining
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**SurrealDB Schema**:
|
||
|
|
```surql
|
||
|
|
-- Define execution records table
|
||
|
|
DEFINE TABLE executions;
|
||
|
|
DEFINE FIELD agent_id ON TABLE executions TYPE string;
|
||
|
|
DEFINE FIELD task_id ON TABLE executions TYPE string;
|
||
|
|
DEFINE FIELD task_type ON TABLE executions TYPE string;
|
||
|
|
DEFINE FIELD success ON TABLE executions TYPE boolean;
|
||
|
|
DEFINE FIELD quality_score ON TABLE executions TYPE float;
|
||
|
|
DEFINE FIELD timestamp ON TABLE executions TYPE datetime;
|
||
|
|
DEFINE FIELD daily_window ON TABLE executions TYPE string;
|
||
|
|
|
||
|
|
-- Define temporal index for efficient time-range queries
|
||
|
|
DEFINE INDEX idx_execution_temporal ON TABLE executions
|
||
|
|
COLUMNS timestamp, daily_window;
|
||
|
|
|
||
|
|
-- Define learning curves table
|
||
|
|
DEFINE TABLE learning_curves;
|
||
|
|
DEFINE FIELD agent_id ON TABLE learning_curves TYPE string;
|
||
|
|
DEFINE FIELD task_type ON TABLE learning_curves TYPE string;
|
||
|
|
DEFINE FIELD day ON TABLE learning_curves TYPE string;
|
||
|
|
DEFINE FIELD success_rate ON TABLE learning_curves TYPE float;
|
||
|
|
DEFINE FIELD trend ON TABLE learning_curves TYPE string;
|
||
|
|
```
|
||
|
|
|
||
|
|
**Temporal Query (30-Day Learning Curve)**:
|
||
|
|
```rust
|
||
|
|
// crates/vapora-knowledge-graph/src/learning.rs
|
||
|
|
pub async fn compute_learning_curve(
|
||
|
|
db: &Surreal<Ws>,
|
||
|
|
agent_id: &str,
|
||
|
|
task_type: &str,
|
||
|
|
days: u32,
|
||
|
|
) -> Result<Vec<LearningCurve>> {
|
||
|
|
let since = (Utc::now() - Duration::days(days as i64))
|
||
|
|
.format("%Y-%m-%d")
|
||
|
|
.to_string();
|
||
|
|
|
||
|
|
let query = format!(
|
||
|
|
r#"
|
||
|
|
SELECT
|
||
|
|
day,
|
||
|
|
count(id) as total_tasks,
|
||
|
|
count(id WHERE success = true) / count(id) as success_rate,
|
||
|
|
avg(quality_score) as avg_quality,
|
||
|
|
(avg(quality_score) - LAG(avg(quality_score)) OVER (ORDER BY day)) as trend
|
||
|
|
FROM executions
|
||
|
|
WHERE agent_id = {} AND task_type = {} AND daily_window >= {}
|
||
|
|
GROUP BY daily_window
|
||
|
|
ORDER BY daily_window ASC
|
||
|
|
"#,
|
||
|
|
agent_id, task_type, since
|
||
|
|
);
|
||
|
|
|
||
|
|
db.query(query).await?
|
||
|
|
.take::<Vec<LearningCurve>>(0)?
|
||
|
|
.ok_or(Error::NotFound)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Similarity Search (Find Past Solutions)**:
|
||
|
|
```rust
|
||
|
|
pub async fn find_similar_tasks(
|
||
|
|
db: &Surreal<Ws>,
|
||
|
|
task: &Task,
|
||
|
|
limit: u32,
|
||
|
|
) -> Result<Vec<(ExecutionRecord, f32)>> {
|
||
|
|
// Compute embedding similarity for task description
|
||
|
|
let similarity_threshold = 0.85;
|
||
|
|
|
||
|
|
let query = r#"
|
||
|
|
SELECT
|
||
|
|
executions.*,
|
||
|
|
<similarity_score> as score
|
||
|
|
FROM executions
|
||
|
|
WHERE similarity_score > {} AND success = true
|
||
|
|
ORDER BY similarity_score DESC
|
||
|
|
LIMIT {}
|
||
|
|
"#;
|
||
|
|
|
||
|
|
db.query(query)
|
||
|
|
.bind(("similarity_score", similarity_threshold))
|
||
|
|
.bind(("limit", limit))
|
||
|
|
.await?
|
||
|
|
.take::<Vec<(ExecutionRecord, f32)>>(0)?
|
||
|
|
.ok_or(Error::NotFound)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Causal Graph (Problem Resolution)**:
|
||
|
|
```rust
|
||
|
|
pub async fn trace_solution_chain(
|
||
|
|
db: &Surreal<Ws>,
|
||
|
|
problem_task_id: &str,
|
||
|
|
) -> Result<Vec<ExecutionRecord>> {
|
||
|
|
let query = format!(
|
||
|
|
r#"
|
||
|
|
SELECT
|
||
|
|
->(resolved_by)->executions AS solutions
|
||
|
|
FROM tasks
|
||
|
|
WHERE id = {}
|
||
|
|
"#,
|
||
|
|
problem_task_id
|
||
|
|
);
|
||
|
|
|
||
|
|
db.query(query)
|
||
|
|
.await?
|
||
|
|
.take::<Vec<ExecutionRecord>>(0)?
|
||
|
|
.ok_or(Error::NotFound)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key Files**:
|
||
|
|
- `/crates/vapora-knowledge-graph/src/learning.rs` (learning curve computation)
|
||
|
|
- `/crates/vapora-knowledge-graph/src/persistence.rs` (DB persistence)
|
||
|
|
- `/crates/vapora-knowledge-graph/src/models.rs` (temporal models)
|
||
|
|
- `/crates/vapora-backend/src/services/` (uses KG for task recommendations)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Verification
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Test learning curve computation
|
||
|
|
cargo test -p vapora-knowledge-graph test_learning_curve_30day
|
||
|
|
|
||
|
|
# Test similarity search
|
||
|
|
cargo test -p vapora-knowledge-graph test_similarity_search
|
||
|
|
|
||
|
|
# Test causal graph traversal
|
||
|
|
cargo test -p vapora-knowledge-graph test_causal_chain
|
||
|
|
|
||
|
|
# Test retention policy (30-day window)
|
||
|
|
cargo test -p vapora-knowledge-graph test_retention_policy
|
||
|
|
|
||
|
|
# Integration test: full KG workflow
|
||
|
|
cargo test -p vapora-knowledge-graph test_full_kg_lifecycle
|
||
|
|
|
||
|
|
# Query performance test
|
||
|
|
cargo bench -p vapora-knowledge-graph bench_temporal_queries
|
||
|
|
```
|
||
|
|
|
||
|
|
**Expected Output**:
|
||
|
|
- Learning curves computed correctly
|
||
|
|
- Similarity search finds relevant past executions
|
||
|
|
- Causal chains traceable
|
||
|
|
- Retention policy removes old records
|
||
|
|
- Temporal queries perform well (<100ms)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Consequences
|
||
|
|
|
||
|
|
### Data Management
|
||
|
|
- Storage grows ~1MB per 1000 executions (depends on detail level)
|
||
|
|
- Retention policy: 30 days (users), 90 days (enterprise)
|
||
|
|
- Archival strategy for historical analysis
|
||
|
|
|
||
|
|
### Agent Learning
|
||
|
|
- Agents access KG to find similar past solutions
|
||
|
|
- Learning curves inform agent selection (see ADR-014)
|
||
|
|
- Improvement trends visible for monitoring
|
||
|
|
|
||
|
|
### Observability
|
||
|
|
- Full audit trail of agent decisions
|
||
|
|
- Trending analysis for capacity planning
|
||
|
|
- Incident investigation via causal chains
|
||
|
|
|
||
|
|
### Scalability
|
||
|
|
- Graph queries optimized with indexes
|
||
|
|
- Temporal queries use daily windows (efficient partition)
|
||
|
|
- Similarity search scales to millions of records
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- `/crates/vapora-knowledge-graph/src/learning.rs` (implementation)
|
||
|
|
- `/crates/vapora-knowledge-graph/src/persistence.rs` (persistence layer)
|
||
|
|
- ADR-004 (SurrealDB)
|
||
|
|
- ADR-014 (Learning Profiles)
|
||
|
|
- ADR-019 (Temporal Execution History)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Related ADRs**: ADR-004 (SurrealDB), ADR-014 (Learning Profiles), ADR-019 (Temporal History)
|