kogral/docs/storage/overview.md
2026-01-23 16:11:07 +00:00

16 KiB

Storage Architecture

The Knowledge Base uses a hybrid storage strategy combining filesystem, SurrealDB, and in-memory backends to balance git-friendliness, scalability, and performance.

Overview

Storage Architecture

The storage layer is abstracted through a common Storage trait, allowing KOGRAL to use different backends based on project needs:

  • Filesystem: Git-tracked markdown files for local project knowledge
  • SurrealDB: Scalable graph database for shared organizational knowledge
  • In-Memory: Fast ephemeral storage for testing and caching

Storage Trait

All storage backends implement a common async trait:

#[async_trait]
pub trait Storage: Send + Sync {
    async fn save_node(&self, node: &Node) -> Result<()>;
    async fn load_node(&self, id: &str) -> Result<Node>;
    async fn delete_node(&self, id: &str) -> Result<()>;
    async fn list_nodes(&self, node_type: Option<NodeType>) -> Result<Vec<String>>;
    async fn search(&self, query: &str) -> Result<Vec<Node>>;

    async fn save_edge(&self, edge: &Edge) -> Result<()>;
    async fn load_edges(&self, node_id: &str) -> Result<Vec<Edge>>;
    async fn delete_edge(&self, from: &str, to: &str, relation: EdgeType) -> Result<()>;

    async fn save_graph(&self, graph: &Graph) -> Result<()>;
    async fn load_graph(&self, name: &str) -> Result<Graph>;
}

This abstraction allows:

  • Swapping backends without code changes (config-driven)
  • Testing with in-memory storage
  • Hybrid setups (filesystem + SurrealDB)
  • Custom backends (implement trait + add to config)

Filesystem Storage

Purpose

Git-friendly, human-readable knowledge storage

Ideal for:

  • Project-specific knowledge
  • Version control with git
  • Code review of knowledge changes
  • Offline work
  • Logseq compatibility

File Layout

.kogral/
├── config.toml              # Graph metadata
├── notes/
│   ├── 2026-01-17-topic.md
│   ├── async-patterns.md
│   └── error-handling.md
├── decisions/
│   ├── 0001-use-rust.md     # ADR format
│   ├── 0002-surrealdb.md
│   └── 0003-nickel-config.md
├── guidelines/
│   ├── rust-errors.md
│   ├── testing-standards.md
│   └── api-design.md
├── patterns/
│   ├── repository-pattern.md
│   ├── builder-pattern.md
│   └── async-error-handling.md
└── journal/
    ├── 2026-01-17.md        # Daily notes
    └── 2026-01-18.md

Document Format

Each document is markdown with YAML frontmatter:

---
id: note-rust-async-traits
type: note
title: Async Trait Patterns in Rust
created: 2026-01-17T10:30:00Z
modified: 2026-01-17T15:45:00Z
tags: [rust, async, traits]
status: active
relates_to:
  - pattern-async-error-handling
  - guideline-rust-async
depends_on:
  - note-rust-basics
project: knowledge-base
---

# Async Trait Patterns in Rust

Using async traits with the `async-trait` crate...

## Pattern 1: Boxed Futures

When working with async traits, use `async-trait` to avoid lifetime issues:

\`\`\`rust
#[async_trait]
trait DataSource {
    async fn fetch(&self, id: &str) -> Result<Data>;
}
\`\`\`

## Related Concepts

See [[pattern-async-error-handling]] for error handling in async contexts.

Depends on understanding [[note-rust-basics]] first.

Features

Wikilinks: [[other-note]] automatically creates relationships

Code References: @src/main.rs:42 links to code locations

Git Integration:

  • Diffs show knowledge changes
  • Branches for experimental knowledge
  • PRs for knowledge review
  • History tracking
  • Blame shows knowledge authors

Logseq Compatibility: Format is compatible with Logseq for graph visualization

Implementation

pub struct FilesystemStorage {
    root: PathBuf,
}

impl FilesystemStorage {
    pub fn new(root: impl Into<PathBuf>) -> Result<Self> {
        let root = root.into();
        fs::create_dir_all(&root)?;
        Ok(Self { root })
    }

    fn node_path(&self, node_type: NodeType, id: &str) -> PathBuf {
        let subdir = match node_type {
            NodeType::Note => "notes",
            NodeType::Decision => "decisions",
            NodeType::Guideline => "guidelines",
            NodeType::Pattern => "patterns",
            NodeType::Journal => "journal",
            NodeType::Execution => "executions",
        };
        self.root.join(subdir).join(format!("{}.md", id))
    }
}

#[async_trait]
impl Storage for FilesystemStorage {
    async fn save_node(&self, node: &Node) -> Result<()> {
        let path = self.node_path(node.node_type, &node.id);
        fs::create_dir_all(path.parent().unwrap())?;

        let markdown = format_node_as_markdown(node)?;
        fs::write(path, markdown)?;
        Ok(())
    }

    async fn load_node(&self, id: &str) -> Result<Node> {
        // Try each node type directory
        for node_type in NodeType::iter() {
            let path = self.node_path(node_type, id);
            if path.exists() {
                let content = fs::read_to_string(path)?;
                return parse_markdown_node(&content);
            }
        }
        Err(KbError::NodeNotFound(id.to_string()))
    }
}

File Watching

Filesystem storage can watch for changes and auto-sync:

use notify::{Watcher, RecursiveMode, Event};

impl FilesystemStorage {
    pub async fn watch(&self, on_change: impl Fn(Event) + Send + 'static) -> Result<()> {
        let (tx, rx) = channel();
        let mut watcher = RecommendedWatcher::new(tx, Config::default())?;

        watcher.watch(&self.root, RecursiveMode::Recursive)?;

        while let Ok(event) = rx.recv() {
            on_change(event);
        }
        Ok(())
    }
}

SurrealDB Storage

Purpose

Scalable graph database for shared organizational knowledge

Ideal for:

  • Organization-wide guidelines
  • Shared patterns library
  • Advanced graph queries
  • Semantic search with embeddings
  • Multi-project knowledge sharing

Schema

SurrealDB schema for nodes and edges:

DEFINE TABLE node SCHEMAFULL;

DEFINE FIELD id ON node TYPE string;
DEFINE FIELD node_type ON node TYPE string ASSERT $value INSIDE ["note", "decision", "guideline", "pattern", "journal", "execution"];
DEFINE FIELD title ON node TYPE string;
DEFINE FIELD content ON node TYPE string;
DEFINE FIELD tags ON node TYPE array;
DEFINE FIELD status ON node TYPE string;
DEFINE FIELD created ON node TYPE datetime;
DEFINE FIELD modified ON node TYPE datetime;
DEFINE FIELD embedding ON node TYPE array<number>;  -- For semantic search

DEFINE INDEX unique_node_id ON node COLUMNS id UNIQUE;
DEFINE INDEX node_type_idx ON node COLUMNS node_type;
DEFINE INDEX node_tags_idx ON node COLUMNS tags;

-- Relationship table
DEFINE TABLE edge SCHEMAFULL;

DEFINE FIELD from ON edge TYPE record(node);
DEFINE FIELD to ON edge TYPE record(node);
DEFINE FIELD relation ON edge TYPE string ASSERT $value INSIDE ["relates_to", "depends_on", "implements", "extends", "supersedes", "explains"];
DEFINE FIELD strength ON edge TYPE number;
DEFINE FIELD created ON edge TYPE datetime;

Graph Queries

SurrealDB's native graph support enables powerful queries:

Find all dependencies (transitive):

SELECT * FROM node:note-id->depends_on->node;

Find related guidelines (2 hops):

SELECT * FROM node:guideline-rust-errors<-relates_to<-node<-relates_to<-node;

Semantic search (cosine similarity):

SELECT *, vector::similarity::cosine(embedding, $query_embedding) AS score
FROM node
WHERE score > 0.6
ORDER BY score DESC
LIMIT 10;

Implementation

use surrealdb::{Surreal, engine::remote::ws::Client};

pub struct SurrealDbStorage {
    db: Surreal<Client>,
    namespace: String,
    database: String,
}

impl SurrealDbStorage {
    pub async fn new(config: &KbConfig) -> Result<Self> {
        let db = Surreal::new::<Ws>(&config.storage.secondary.url).await?;

        db.signin(Root {
            username: &config.storage.secondary.username,
            password: &config.storage.secondary.password,
        }).await?;

        db.use_ns(&config.storage.secondary.namespace)
            .use_db(&config.storage.secondary.database)
            .await?;

        Ok(Self {
            db,
            namespace: config.storage.secondary.namespace.clone(),
            database: config.storage.secondary.database.clone(),
        })
    }
}

#[async_trait]
impl Storage for SurrealDbStorage {
    async fn save_node(&self, node: &Node) -> Result<()> {
        let _: Option<Node> = self.db
            .create(("node", &node.id))
            .content(node)
            .await?;
        Ok(())
    }

    async fn load_node(&self, id: &str) -> Result<Node> {
        self.db.select(("node", id)).await?
            .ok_or_else(|| KbError::NodeNotFound(id.to_string()))
    }

    async fn search(&self, query: &str) -> Result<Vec<Node>> {
        let sql = "SELECT * FROM node WHERE title ~ $query OR content ~ $query";
        let mut result = self.db.query(sql).bind(("query", query)).await?;
        let nodes: Vec<Node> = result.take(0)?;
        Ok(nodes)
    }
}

Multi-Tenancy

SurrealDB supports namespaces and databases for isolation:

Namespace: "kb"
├── Database: "shared" (organization-wide)
│   └── Nodes: guidelines, patterns, policies
└── Database: "project-foo" (project-specific)
    └── Nodes: project decisions, local notes

Configuration:

storage = {
  secondary = {
    enabled = true,
    namespace = "kogral",
    database = "shared",  # or "project-foo" for project-specific
  },
}

In-Memory Storage

Purpose

Fast ephemeral storage for testing and caching

Ideal for:

  • Unit tests (isolated, deterministic)
  • Integration tests (fast setup/teardown)
  • Session caching (temporary graphs)
  • Development mode (rapid iteration)

Implementation

use dashmap::DashMap;

pub struct MemoryStorage {
    nodes: DashMap<String, Node>,
    edges: DashMap<(String, String, EdgeType), Edge>,
}

impl MemoryStorage {
    pub fn new() -> Self {
        Self {
            nodes: DashMap::new(),
            edges: DashMap::new(),
        }
    }
}

#[async_trait]
impl Storage for MemoryStorage {
    async fn save_node(&self, node: &Node) -> Result<()> {
        self.nodes.insert(node.id.clone(), node.clone());
        Ok(())
    }

    async fn load_node(&self, id: &str) -> Result<Node> {
        self.nodes.get(id)
            .map(|entry| entry.value().clone())
            .ok_or_else(|| KbError::NodeNotFound(id.to_string()))
    }

    async fn list_nodes(&self, node_type: Option<NodeType>) -> Result<Vec<String>> {
        Ok(self.nodes.iter()
            .filter(|entry| {
                node_type.map_or(true, |t| entry.value().node_type == t)
            })
            .map(|entry| entry.key().clone())
            .collect())
    }

    async fn search(&self, query: &str) -> Result<Vec<Node>> {
        Ok(self.nodes.iter()
            .filter(|entry| {
                let node = entry.value();
                node.title.contains(query) ||
                node.content.contains(query) ||
                node.tags.iter().any(|tag| tag.contains(query))
            })
            .map(|entry| entry.value().clone())
            .collect())
    }
}

Concurrency

DashMap provides lock-free concurrent access:

// Multiple threads can read/write simultaneously
let storage = Arc::new(MemoryStorage::new());

tokio::spawn({
    let storage = storage.clone();
    async move {
        storage.save_node(&node1).await.unwrap();
    }
});

tokio::spawn({
    let storage = storage.clone();
    async move {
        storage.save_node(&node2).await.unwrap();
    }
});

Hybrid Storage Strategy

Architecture

Combine filesystem and SurrealDB for best of both worlds:

Project Graph (local)          Shared Graph (central)
      ↓                                 ↓
Filesystem Storage             SurrealDB Storage
  .kogral/notes/                   namespace: "kb"
  .kogral/decisions/               database: "shared"
  .kogral/guidelines/              nodes: shared guidelines
      ↓                                 ↓
      └─────── Sync Mechanism ─────────┘
               (bidirectional)

Local advantages:

  • Git-tracked (version control)
  • Offline work
  • Fast local queries
  • Human-readable diffs

Central advantages:

  • Shared across projects
  • Advanced graph queries
  • Semantic search at scale
  • Organization-wide knowledge

Sync Mechanism

Bidirectional synchronization keeps both in sync:

pub struct SyncManager {
    filesystem: Arc<FilesystemStorage>,
    surrealdb: Arc<SurrealDbStorage>,
    config: SyncConfig,
}

impl SyncManager {
    pub async fn sync_to_central(&self) -> Result<()> {
        let nodes = self.filesystem.list_nodes(None).await?;

        for node_id in nodes {
            let node = self.filesystem.load_node(&node_id).await?;
            self.surrealdb.save_node(&node).await?;
        }

        Ok(())
    }

    pub async fn sync_from_central(&self) -> Result<()> {
        let nodes = self.surrealdb.list_nodes(None).await?;

        for node_id in nodes {
            let node = self.surrealdb.load_node(&node_id).await?;
            self.filesystem.save_node(&node).await?;
        }

        Ok(())
    }

    pub async fn watch_and_sync(&self) -> Result<()> {
        self.filesystem.watch(|event| {
            if let Some(path) = event.path {
                // Debounce and sync changed file
                tokio::spawn(async move {
                    sleep(Duration::from_millis(self.config.debounce_ms)).await;
                    self.sync_to_central().await
                });
            }
        }).await
    }
}

Conflict Resolution

When both storage have different versions of the same node:

Strategy 1: Last-Write-Wins (based on modified timestamp)

if filesystem_node.modified > surrealdb_node.modified {
    surrealdb.save_node(&filesystem_node).await?;
} else {
    filesystem.save_node(&surrealdb_node).await?;
}

Strategy 2: User Prompt (safe, explicit)

if filesystem_node != surrealdb_node {
    match prompt_user(&filesystem_node, &surrealdb_node)? {
        Choice::KeepLocal => surrealdb.save_node(&filesystem_node).await?,
        Choice::KeepCentral => filesystem.save_node(&surrealdb_node).await?,
        Choice::Merge => /* merge and save */,
    }
}

Strategy 3: Both (create branches)

// Save both as separate nodes with relationship
filesystem_node.id = format!("{}-local", original_id);
surrealdb_node.id = format!("{}-central", original_id);
storage.save_node(&filesystem_node).await?;
storage.save_node(&surrealdb_node).await?;
storage.save_edge(&Edge {
    from: filesystem_node.id,
    to: surrealdb_node.id,
    relation: EdgeType::RelatesTo,
    strength: 1.0,
}).await?;

Configuration

Storage backend is selected via config:

{
  storage = {
    # Primary storage (always used)
    primary = 'filesystem,  # or 'memory, 'surrealdb

    # Secondary storage (optional, for hybrid setup)
    secondary = {
      enabled = true,  # Enable SurrealDB
      type = 'surrealdb,
      url = "ws://localhost:8000",
      namespace = "kogral",
      database = "shared",
      username = "root",  # Or from env var
      password = "root",
    },
  },

  sync = {
    auto_index = true,  # Auto-sync to secondary
    debounce_ms = 500,  # Wait before syncing
    watch_paths = ["notes", "decisions", "guidelines"],
  },
}

Performance Considerations

Filesystem Storage

  • Read: O(1) if ID known, O(n) for scanning
  • Write: Fast (single file write)
  • Search: O(n) text scan, slow for large graphs
  • Optimization: Use file system cache, lazy load

SurrealDB Storage

  • Read: O(log n) with indexes
  • Write: Fast with async commits
  • Search: O(1) with full-text index
  • Graph traversal: Optimized with native graph support
  • Optimization: Index on tags, node_type, embeddings

In-Memory Storage

  • Read: O(1) with DashMap
  • Write: O(1) lock-free
  • Search: O(n) iteration
  • Memory: Entire graph in RAM

Security

  • Filesystem: Unix permissions, path sanitization
  • SurrealDB: Authentication, namespaces, TLS
  • In-Memory: Process isolation, no persistence

See Also