kogral/docs/storage/overview.md
2026-01-23 16:11:07 +00:00

640 lines
16 KiB
Markdown

# Storage Architecture
The Knowledge Base uses a **hybrid storage strategy** combining filesystem, SurrealDB, and in-memory backends to balance git-friendliness, scalability, and performance.
## Overview
![Storage Architecture](../diagrams/storage-architecture.svg)
The storage layer is abstracted through a common `Storage` trait, allowing KOGRAL to use different backends based on project needs:
- **Filesystem**: Git-tracked markdown files for local project knowledge
- **SurrealDB**: Scalable graph database for shared organizational knowledge
- **In-Memory**: Fast ephemeral storage for testing and caching
## Storage Trait
All storage backends implement a common async trait:
```rust
#[async_trait]
pub trait Storage: Send + Sync {
async fn save_node(&self, node: &Node) -> Result<()>;
async fn load_node(&self, id: &str) -> Result<Node>;
async fn delete_node(&self, id: &str) -> Result<()>;
async fn list_nodes(&self, node_type: Option<NodeType>) -> Result<Vec<String>>;
async fn search(&self, query: &str) -> Result<Vec<Node>>;
async fn save_edge(&self, edge: &Edge) -> Result<()>;
async fn load_edges(&self, node_id: &str) -> Result<Vec<Edge>>;
async fn delete_edge(&self, from: &str, to: &str, relation: EdgeType) -> Result<()>;
async fn save_graph(&self, graph: &Graph) -> Result<()>;
async fn load_graph(&self, name: &str) -> Result<Graph>;
}
```
This abstraction allows:
- ✅ Swapping backends without code changes (config-driven)
- ✅ Testing with in-memory storage
- ✅ Hybrid setups (filesystem + SurrealDB)
- ✅ Custom backends (implement trait + add to config)
## Filesystem Storage
### Purpose
**Git-friendly, human-readable knowledge storage**
Ideal for:
- Project-specific knowledge
- Version control with git
- Code review of knowledge changes
- Offline work
- Logseq compatibility
### File Layout
```text
.kogral/
├── config.toml # Graph metadata
├── notes/
│ ├── 2026-01-17-topic.md
│ ├── async-patterns.md
│ └── error-handling.md
├── decisions/
│ ├── 0001-use-rust.md # ADR format
│ ├── 0002-surrealdb.md
│ └── 0003-nickel-config.md
├── guidelines/
│ ├── rust-errors.md
│ ├── testing-standards.md
│ └── api-design.md
├── patterns/
│ ├── repository-pattern.md
│ ├── builder-pattern.md
│ └── async-error-handling.md
└── journal/
├── 2026-01-17.md # Daily notes
└── 2026-01-18.md
```
### Document Format
Each document is **markdown with YAML frontmatter**:
```markdown
---
id: note-rust-async-traits
type: note
title: Async Trait Patterns in Rust
created: 2026-01-17T10:30:00Z
modified: 2026-01-17T15:45:00Z
tags: [rust, async, traits]
status: active
relates_to:
- pattern-async-error-handling
- guideline-rust-async
depends_on:
- note-rust-basics
project: knowledge-base
---
# Async Trait Patterns in Rust
Using async traits with the `async-trait` crate...
## Pattern 1: Boxed Futures
When working with async traits, use `async-trait` to avoid lifetime issues:
\`\`\`rust
#[async_trait]
trait DataSource {
async fn fetch(&self, id: &str) -> Result<Data>;
}
\`\`\`
## Related Concepts
See [[pattern-async-error-handling]] for error handling in async contexts.
Depends on understanding [[note-rust-basics]] first.
```
### Features
**Wikilinks**: `[[other-note]]` automatically creates relationships
**Code References**: `@src/main.rs:42` links to code locations
**Git Integration**:
- Diffs show knowledge changes
- Branches for experimental knowledge
- PRs for knowledge review
- History tracking
- Blame shows knowledge authors
**Logseq Compatibility**: Format is compatible with Logseq for graph visualization
### Implementation
```rust
pub struct FilesystemStorage {
root: PathBuf,
}
impl FilesystemStorage {
pub fn new(root: impl Into<PathBuf>) -> Result<Self> {
let root = root.into();
fs::create_dir_all(&root)?;
Ok(Self { root })
}
fn node_path(&self, node_type: NodeType, id: &str) -> PathBuf {
let subdir = match node_type {
NodeType::Note => "notes",
NodeType::Decision => "decisions",
NodeType::Guideline => "guidelines",
NodeType::Pattern => "patterns",
NodeType::Journal => "journal",
NodeType::Execution => "executions",
};
self.root.join(subdir).join(format!("{}.md", id))
}
}
#[async_trait]
impl Storage for FilesystemStorage {
async fn save_node(&self, node: &Node) -> Result<()> {
let path = self.node_path(node.node_type, &node.id);
fs::create_dir_all(path.parent().unwrap())?;
let markdown = format_node_as_markdown(node)?;
fs::write(path, markdown)?;
Ok(())
}
async fn load_node(&self, id: &str) -> Result<Node> {
// Try each node type directory
for node_type in NodeType::iter() {
let path = self.node_path(node_type, id);
if path.exists() {
let content = fs::read_to_string(path)?;
return parse_markdown_node(&content);
}
}
Err(KbError::NodeNotFound(id.to_string()))
}
}
```
### File Watching
Filesystem storage can watch for changes and auto-sync:
```rust
use notify::{Watcher, RecursiveMode, Event};
impl FilesystemStorage {
pub async fn watch(&self, on_change: impl Fn(Event) + Send + 'static) -> Result<()> {
let (tx, rx) = channel();
let mut watcher = RecommendedWatcher::new(tx, Config::default())?;
watcher.watch(&self.root, RecursiveMode::Recursive)?;
while let Ok(event) = rx.recv() {
on_change(event);
}
Ok(())
}
}
```
## SurrealDB Storage
### Purpose
**Scalable graph database for shared organizational knowledge**
Ideal for:
- Organization-wide guidelines
- Shared patterns library
- Advanced graph queries
- Semantic search with embeddings
- Multi-project knowledge sharing
### Schema
SurrealDB schema for nodes and edges:
```sql
DEFINE TABLE node SCHEMAFULL;
DEFINE FIELD id ON node TYPE string;
DEFINE FIELD node_type ON node TYPE string ASSERT $value INSIDE ["note", "decision", "guideline", "pattern", "journal", "execution"];
DEFINE FIELD title ON node TYPE string;
DEFINE FIELD content ON node TYPE string;
DEFINE FIELD tags ON node TYPE array;
DEFINE FIELD status ON node TYPE string;
DEFINE FIELD created ON node TYPE datetime;
DEFINE FIELD modified ON node TYPE datetime;
DEFINE FIELD embedding ON node TYPE array<number>; -- For semantic search
DEFINE INDEX unique_node_id ON node COLUMNS id UNIQUE;
DEFINE INDEX node_type_idx ON node COLUMNS node_type;
DEFINE INDEX node_tags_idx ON node COLUMNS tags;
-- Relationship table
DEFINE TABLE edge SCHEMAFULL;
DEFINE FIELD from ON edge TYPE record(node);
DEFINE FIELD to ON edge TYPE record(node);
DEFINE FIELD relation ON edge TYPE string ASSERT $value INSIDE ["relates_to", "depends_on", "implements", "extends", "supersedes", "explains"];
DEFINE FIELD strength ON edge TYPE number;
DEFINE FIELD created ON edge TYPE datetime;
```
### Graph Queries
SurrealDB's native graph support enables powerful queries:
**Find all dependencies** (transitive):
```sql
SELECT * FROM node:note-id->depends_on->node;
```
**Find related guidelines** (2 hops):
```sql
SELECT * FROM node:guideline-rust-errors<-relates_to<-node<-relates_to<-node;
```
**Semantic search** (cosine similarity):
```sql
SELECT *, vector::similarity::cosine(embedding, $query_embedding) AS score
FROM node
WHERE score > 0.6
ORDER BY score DESC
LIMIT 10;
```
### Implementation
```rust
use surrealdb::{Surreal, engine::remote::ws::Client};
pub struct SurrealDbStorage {
db: Surreal<Client>,
namespace: String,
database: String,
}
impl SurrealDbStorage {
pub async fn new(config: &KbConfig) -> Result<Self> {
let db = Surreal::new::<Ws>(&config.storage.secondary.url).await?;
db.signin(Root {
username: &config.storage.secondary.username,
password: &config.storage.secondary.password,
}).await?;
db.use_ns(&config.storage.secondary.namespace)
.use_db(&config.storage.secondary.database)
.await?;
Ok(Self {
db,
namespace: config.storage.secondary.namespace.clone(),
database: config.storage.secondary.database.clone(),
})
}
}
#[async_trait]
impl Storage for SurrealDbStorage {
async fn save_node(&self, node: &Node) -> Result<()> {
let _: Option<Node> = self.db
.create(("node", &node.id))
.content(node)
.await?;
Ok(())
}
async fn load_node(&self, id: &str) -> Result<Node> {
self.db.select(("node", id)).await?
.ok_or_else(|| KbError::NodeNotFound(id.to_string()))
}
async fn search(&self, query: &str) -> Result<Vec<Node>> {
let sql = "SELECT * FROM node WHERE title ~ $query OR content ~ $query";
let mut result = self.db.query(sql).bind(("query", query)).await?;
let nodes: Vec<Node> = result.take(0)?;
Ok(nodes)
}
}
```
### Multi-Tenancy
SurrealDB supports namespaces and databases for isolation:
```text
Namespace: "kb"
├── Database: "shared" (organization-wide)
│ └── Nodes: guidelines, patterns, policies
└── Database: "project-foo" (project-specific)
└── Nodes: project decisions, local notes
```
Configuration:
```nickel
storage = {
secondary = {
enabled = true,
namespace = "kogral",
database = "shared", # or "project-foo" for project-specific
},
}
```
## In-Memory Storage
### Purpose
**Fast ephemeral storage for testing and caching**
Ideal for:
- Unit tests (isolated, deterministic)
- Integration tests (fast setup/teardown)
- Session caching (temporary graphs)
- Development mode (rapid iteration)
### Implementation
```rust
use dashmap::DashMap;
pub struct MemoryStorage {
nodes: DashMap<String, Node>,
edges: DashMap<(String, String, EdgeType), Edge>,
}
impl MemoryStorage {
pub fn new() -> Self {
Self {
nodes: DashMap::new(),
edges: DashMap::new(),
}
}
}
#[async_trait]
impl Storage for MemoryStorage {
async fn save_node(&self, node: &Node) -> Result<()> {
self.nodes.insert(node.id.clone(), node.clone());
Ok(())
}
async fn load_node(&self, id: &str) -> Result<Node> {
self.nodes.get(id)
.map(|entry| entry.value().clone())
.ok_or_else(|| KbError::NodeNotFound(id.to_string()))
}
async fn list_nodes(&self, node_type: Option<NodeType>) -> Result<Vec<String>> {
Ok(self.nodes.iter()
.filter(|entry| {
node_type.map_or(true, |t| entry.value().node_type == t)
})
.map(|entry| entry.key().clone())
.collect())
}
async fn search(&self, query: &str) -> Result<Vec<Node>> {
Ok(self.nodes.iter()
.filter(|entry| {
let node = entry.value();
node.title.contains(query) ||
node.content.contains(query) ||
node.tags.iter().any(|tag| tag.contains(query))
})
.map(|entry| entry.value().clone())
.collect())
}
}
```
### Concurrency
`DashMap` provides lock-free concurrent access:
```rust
// Multiple threads can read/write simultaneously
let storage = Arc::new(MemoryStorage::new());
tokio::spawn({
let storage = storage.clone();
async move {
storage.save_node(&node1).await.unwrap();
}
});
tokio::spawn({
let storage = storage.clone();
async move {
storage.save_node(&node2).await.unwrap();
}
});
```
## Hybrid Storage Strategy
### Architecture
Combine filesystem and SurrealDB for best of both worlds:
```text
Project Graph (local) Shared Graph (central)
↓ ↓
Filesystem Storage SurrealDB Storage
.kogral/notes/ namespace: "kb"
.kogral/decisions/ database: "shared"
.kogral/guidelines/ nodes: shared guidelines
↓ ↓
└─────── Sync Mechanism ─────────┘
(bidirectional)
```
**Local advantages**:
- Git-tracked (version control)
- Offline work
- Fast local queries
- Human-readable diffs
**Central advantages**:
- Shared across projects
- Advanced graph queries
- Semantic search at scale
- Organization-wide knowledge
### Sync Mechanism
Bidirectional synchronization keeps both in sync:
```rust
pub struct SyncManager {
filesystem: Arc<FilesystemStorage>,
surrealdb: Arc<SurrealDbStorage>,
config: SyncConfig,
}
impl SyncManager {
pub async fn sync_to_central(&self) -> Result<()> {
let nodes = self.filesystem.list_nodes(None).await?;
for node_id in nodes {
let node = self.filesystem.load_node(&node_id).await?;
self.surrealdb.save_node(&node).await?;
}
Ok(())
}
pub async fn sync_from_central(&self) -> Result<()> {
let nodes = self.surrealdb.list_nodes(None).await?;
for node_id in nodes {
let node = self.surrealdb.load_node(&node_id).await?;
self.filesystem.save_node(&node).await?;
}
Ok(())
}
pub async fn watch_and_sync(&self) -> Result<()> {
self.filesystem.watch(|event| {
if let Some(path) = event.path {
// Debounce and sync changed file
tokio::spawn(async move {
sleep(Duration::from_millis(self.config.debounce_ms)).await;
self.sync_to_central().await
});
}
}).await
}
}
```
### Conflict Resolution
When both storage have different versions of the same node:
**Strategy 1: Last-Write-Wins** (based on `modified` timestamp)
```rust
if filesystem_node.modified > surrealdb_node.modified {
surrealdb.save_node(&filesystem_node).await?;
} else {
filesystem.save_node(&surrealdb_node).await?;
}
```
**Strategy 2: User Prompt** (safe, explicit)
```rust
if filesystem_node != surrealdb_node {
match prompt_user(&filesystem_node, &surrealdb_node)? {
Choice::KeepLocal => surrealdb.save_node(&filesystem_node).await?,
Choice::KeepCentral => filesystem.save_node(&surrealdb_node).await?,
Choice::Merge => /* merge and save */,
}
}
```
**Strategy 3: Both** (create branches)
```rust
// Save both as separate nodes with relationship
filesystem_node.id = format!("{}-local", original_id);
surrealdb_node.id = format!("{}-central", original_id);
storage.save_node(&filesystem_node).await?;
storage.save_node(&surrealdb_node).await?;
storage.save_edge(&Edge {
from: filesystem_node.id,
to: surrealdb_node.id,
relation: EdgeType::RelatesTo,
strength: 1.0,
}).await?;
```
## Configuration
Storage backend is selected via config:
```nickel
{
storage = {
# Primary storage (always used)
primary = 'filesystem, # or 'memory, 'surrealdb
# Secondary storage (optional, for hybrid setup)
secondary = {
enabled = true, # Enable SurrealDB
type = 'surrealdb,
url = "ws://localhost:8000",
namespace = "kogral",
database = "shared",
username = "root", # Or from env var
password = "root",
},
},
sync = {
auto_index = true, # Auto-sync to secondary
debounce_ms = 500, # Wait before syncing
watch_paths = ["notes", "decisions", "guidelines"],
},
}
```
## Performance Considerations
### Filesystem Storage
- **Read**: O(1) if ID known, O(n) for scanning
- **Write**: Fast (single file write)
- **Search**: O(n) text scan, slow for large graphs
- **Optimization**: Use file system cache, lazy load
### SurrealDB Storage
- **Read**: O(log n) with indexes
- **Write**: Fast with async commits
- **Search**: O(1) with full-text index
- **Graph traversal**: Optimized with native graph support
- **Optimization**: Index on tags, node_type, embeddings
### In-Memory Storage
- **Read**: O(1) with DashMap
- **Write**: O(1) lock-free
- **Search**: O(n) iteration
- **Memory**: Entire graph in RAM
## Security
- **Filesystem**: Unix permissions, path sanitization
- **SurrealDB**: Authentication, namespaces, TLS
- **In-Memory**: Process isolation, no persistence
## See Also
- **Configuration**: [Storage Configuration](../config/storage.md)
- **Sync Guide**: [Filesystem ↔ SurrealDB Sync](sync.md)
- **ADR**: [Hybrid Storage Decision](../architecture/adrs/003-hybrid-storage.md)
- **API Reference**: [Storage Trait Documentation](../api/storage-trait.md)