kogral/docs/architecture/adrs/003-hybrid-storage.md
2026-01-23 16:11:07 +00:00

11 KiB

ADR-003: Hybrid Storage Strategy

Status: Accepted

Date: 2026-01-17

Deciders: Architecture Team

Context: Storage Backend Strategy for Knowledge Base


Context

The KOGRAL needs to store knowledge graphs with these requirements:

  1. Git-Friendly: Knowledge should version alongside code
  2. Scalable: Support small projects (10s of nodes) to large organizations (10,000+ nodes)
  3. Queryable: Efficient graph queries and relationship traversal
  4. Offline-Capable: Work without network access
  5. Collaborative: Support shared organizational knowledge
  6. Cost-Effective: Free for small projects, reasonable cost at scale

Constraints:

  • Developers want to edit knowledge in text editors
  • Organizations want centralized guideline management
  • Git workflows essential for code-adjacent knowledge
  • Large graphs need database performance

Option 1: Filesystem Only

Approach: Store everything as markdown files

Pros:

  • Git-native (perfect for versioning)
  • Text editor friendly
  • No dependencies
  • Works offline
  • Free

Cons:

  • Poor performance for large graphs (100 0+ nodes)
  • No efficient graph queries
  • Difficult to share across projects
  • Manual sync for collaboration

Scalability: Good for < 100 nodes, poor beyond

Option 2: Database Only (SurrealDB)

Approach: Store all knowledge in SurrealDB graph database

Pros:

  • Excellent query performance
  • Native graph relationships
  • Scalable to millions of nodes
  • Centralized for collaboration

Cons:

  • Not git-trackable
  • Requires running database server
  • Can't edit with text editor
  • Network dependency
  • Infrastructure cost

Scalability: Excellent, but loses developer workflow benefits

Option 3: Hybrid (Filesystem + SurrealDB)

Approach: Filesystem for local project knowledge, SurrealDB for shared organizational knowledge

Pros:

  • Git-friendly for project knowledge
  • Text editor friendly
  • Scalable for shared knowledge
  • Works offline (local graph)
  • Collaborative (shared graph)
  • Cost-effective (DB only for shared)

Cons:

  • More complex implementation
  • Sync mechanism needed
  • Two storage systems to manage

Scalability: Excellent - best of both worlds


Decision

We will use a hybrid storage strategy: Filesystem (local) + SurrealDB (shared).

Architecture:

┌─────────────────────────────────────────────────────────────┐
│                    Project A (.kogral/)                          │
│  Storage: Filesystem (git-tracked)                          │
│  Scope: Project-specific notes, decisions, patterns         │
│  Access: Local only                                         │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   │ [inherits]
                   ↓
┌─────────────────────────────────────────────────────────────┐
│           Shared KB (SurrealDB or synced filesystem)         │
│  Storage: SurrealDB (scalable) or filesystem (synced)       │
│  Scope: Organization-wide guidelines, patterns              │
│  Access: All projects                                       │
└─────────────────────────────────────────────────────────────┘

Implementation:

# Project config
{
  storage = {
    primary = 'filesystem,  # Local project knowledge
    secondary = {
      enabled = true,
      type = 'surrealdb,   # Shared knowledge
      url = "ws://kb-central.company.com:8000",
      namespace = "organization",
      database = "shared-kb",
    },
  },

  inheritance = {
    base = "surrealdb://organization/shared-kb",  # Inherit from shared
    priority = 100,  # Project overrides shared
  },
}

Sync Strategy:

.kogral/ (Filesystem)
    ↓ [on save]
Watch for changes
    ↓ [debounced]
Sync to SurrealDB
    ↓
Shared graph updated
    ↓ [on query]
Merge local + shared results

Consequences

Positive

Developer Workflow Preserved:

# Local knowledge workflow (unchanged)
vim .kogral/notes/my-note.md
git add .kogral/notes/my-note.md
git commit -m "Add implementation note"
git push

Git Integration:

  • Project knowledge versioned with code
  • Branches include relevant knowledge
  • Merges resolve knowledge conflicts
  • PR reviews include knowledge changes

Offline Development:

  • Full functionality without network
  • Shared guidelines cached locally
  • Sync when reconnected

Scalability:

  • Projects: filesystem (100s of nodes, fine performance)
  • Organization: SurrealDB (10,000+ nodes, excellent performance)

Collaboration:

  • Shared guidelines accessible to all projects
  • Updates to shared knowledge propagate automatically
  • Consistent practices across organization

Cost-Effective:

  • Small projects: free (filesystem only)
  • Organizations: SurrealDB for shared only (not all project knowledge)

Gradual Adoption:

  • Start with filesystem only
  • Add SurrealDB when needed
  • Feature-gated (--features surrealdb)

Negative

Complexity:

  • Two storage implementations
  • Sync mechanism required
  • Conflict resolution needed

Mitigation:

  • Storage trait abstracts differences
  • Sync is optional (can disable)
  • Conflicts rare (guidelines change infrequently)

Sync Latency:

  • Changes to shared KOGRAL not instant in all projects

Mitigation:

  • Acceptable latency (guidelines don't change rapidly)
  • Manual sync command available (kogral sync)
  • Auto-sync on query (fetch latest)

Infrastructure Requirement:

  • SurrealDB server needed for shared KOGRAL

Mitigation:

  • Optional (can use synced filesystem instead)
  • Docker Compose for easy setup
  • Managed SurrealDB Cloud option

Neutral

Storage Trait Implementation:

#[async_trait]
pub trait Storage {
    async fn save_graph(&self, graph: &Graph) -> Result<()>;
    async fn load_graph(&self, name: &str) -> Result<Graph>;
    async fn list_graphs(&self) -> Result<Vec<String>>;
}

// Three implementations
impl Storage for FilesystemStorage { /* ... */ }
impl Storage for SurrealDbStorage { /* ... */ }
impl Storage for MemoryStorage { /* ... */ }

Abstraction makes multi-backend manageable.


Use Cases

Small Project (Solo Developer)

Config:

{ storage = { primary = 'filesystem } }

Behavior:

  • All knowledge in .kogral/ directory
  • Git-tracked with code
  • No database required
  • Works offline

Medium Project (Team)

Config:

{
  storage = {
    primary = 'filesystem,
    secondary = {
      enabled = true,
      type = 'surrealdb,
      url = "ws://team-kb.local:8000",
    },
  },
}

Behavior:

  • Project knowledge in .kogral/ (git-tracked)
  • Shared team patterns in SurrealDB
  • Automatic sync
  • Offline fallback to cached

Large Organization

Config:

{
  storage = {
    primary = 'filesystem,
    secondary = {
      enabled = true,
      type = 'surrealdb,
      url = "ws://kb.company.com:8000",
      namespace = "engineering",
      database = "shared-kb",
    },
  },

  inheritance = {
    base = "surrealdb://engineering/shared-kb",
    guidelines = [
      "surrealdb://engineering/rust-guidelines",
      "surrealdb://engineering/security-policies",
    ],
  },
}

Behavior:

  • Project-specific in .kogral/
  • Organization guidelines in SurrealDB
  • Security policies enforced
  • Automatic guideline updates

Sync Mechanism

Filesystem → SurrealDB

Trigger: File changes detected (via notify crate)

Process:

  1. Parse changed markdown file
  2. Convert to Node struct
  3. Upsert to SurrealDB
  4. Update relationships

Debouncing: 500ms (configurable)

SurrealDB → Filesystem

Trigger: Query for shared knowledge

Process:

  1. Query SurrealDB for shared nodes
  2. Cache locally (in-memory or filesystem)
  3. Merge with local results
  4. Return combined

Caching: TTL-based (5 minutes default)

Conflict Resolution

Strategy: Last-write-wins with version tracking

Example:

Project A: Updates shared guideline (v1 → v2)
Project B: Has cached v1

On Project B query:
  - Detects v2 available
  - Fetches v2
  - Updates cache
  - Uses v2 going forward

Alternatives Considered

Git Submodules for Shared Knowledge

Rejected: Cumbersome workflow

  • Requires manual submodule update
  • Merge conflicts in shared submodule
  • Not discoverable (need to know submodule exists)

Syncthing for Filesystem Sync

Rejected: Not designed for this use case

  • No query optimization
  • No relationship indexes
  • Sync conflicts difficult to resolve

PostgreSQL with JSON

Rejected: Not a graph database

  • Poor graph query performance
  • Relationship traversal requires complex SQL joins
  • No native graph features

Migration Path

Phase 1: Filesystem Only (Current)

  • All storage via filesystem
  • Git-tracked
  • No database required

Phase 2: Optional SurrealDB

  • Add SurrealDB support (feature-gated)
  • Manual sync command
  • Shared KB opt-in

Phase 3: Automatic Sync

  • File watching
  • Auto-sync on changes
  • Background sync

Phase 4: Multi-Tenant SurrealDB

  • Organization namespaces
  • Access control
  • Audit logs

Monitoring

Success Criteria:

  • Developers don't notice hybrid complexity
  • Sync completes < 1 second for typical changes
  • Shared guidelines accessible in < 100ms
  • Zero data loss in sync

Metrics:

  • Sync latency (P50, P95, P99)
  • Cache hit rate (shared knowledge)
  • Conflict rate (expect < 0.1%)
  • User satisfaction

References


Revision History

Date Author Change
2026-01-17 Architecture Team Initial decision

Previous ADR: ADR-002: FastEmbed via AI Providers Next ADR: ADR-004: Logseq Compatibility