# ADR-003: Hybrid Storage Strategy **Status**: Accepted **Date**: 2026-01-17 **Deciders**: Architecture Team **Context**: Storage Backend Strategy for Knowledge Base --- ## Context The KOGRAL needs to store knowledge graphs with these requirements: 1. **Git-Friendly**: Knowledge should version alongside code 2. **Scalable**: Support small projects (10s of nodes) to large organizations (10,000+ nodes) 3. **Queryable**: Efficient graph queries and relationship traversal 4. **Offline-Capable**: Work without network access 5. **Collaborative**: Support shared organizational knowledge 6. **Cost-Effective**: Free for small projects, reasonable cost at scale **Constraints**: - Developers want to edit knowledge in text editors - Organizations want centralized guideline management - Git workflows essential for code-adjacent knowledge - Large graphs need database performance ### Option 1: Filesystem Only **Approach**: Store everything as markdown files **Pros**: - ✅ Git-native (perfect for versioning) - ✅ Text editor friendly - ✅ No dependencies - ✅ Works offline - ✅ Free **Cons**: - ❌ Poor performance for large graphs (100 0+ nodes) - ❌ No efficient graph queries - ❌ Difficult to share across projects - ❌ Manual sync for collaboration **Scalability**: Good for < 100 nodes, poor beyond ### Option 2: Database Only (SurrealDB) **Approach**: Store all knowledge in SurrealDB graph database **Pros**: - ✅ Excellent query performance - ✅ Native graph relationships - ✅ Scalable to millions of nodes - ✅ Centralized for collaboration **Cons**: - ❌ Not git-trackable - ❌ Requires running database server - ❌ Can't edit with text editor - ❌ Network dependency - ❌ Infrastructure cost **Scalability**: Excellent, but loses developer workflow benefits ### Option 3: Hybrid (Filesystem + SurrealDB) **Approach**: Filesystem for local project knowledge, SurrealDB for shared organizational knowledge **Pros**: - ✅ Git-friendly for project knowledge - ✅ Text editor friendly - ✅ Scalable for shared knowledge - ✅ Works offline (local graph) - ✅ Collaborative (shared graph) - ✅ Cost-effective (DB only for shared) **Cons**: - ❌ More complex implementation - ❌ Sync mechanism needed - ❌ Two storage systems to manage **Scalability**: Excellent - best of both worlds --- ## Decision **We will use a hybrid storage strategy: Filesystem (local) + SurrealDB (shared).** **Architecture**: ```text ┌─────────────────────────────────────────────────────────────┐ │ Project A (.kogral/) │ │ Storage: Filesystem (git-tracked) │ │ Scope: Project-specific notes, decisions, patterns │ │ Access: Local only │ └──────────────────┬──────────────────────────────────────────┘ │ │ [inherits] ↓ ┌─────────────────────────────────────────────────────────────┐ │ Shared KB (SurrealDB or synced filesystem) │ │ Storage: SurrealDB (scalable) or filesystem (synced) │ │ Scope: Organization-wide guidelines, patterns │ │ Access: All projects │ └─────────────────────────────────────────────────────────────┘ ``` **Implementation**: ```nickel # Project config { storage = { primary = 'filesystem, # Local project knowledge secondary = { enabled = true, type = 'surrealdb, # Shared knowledge url = "ws://kb-central.company.com:8000", namespace = "organization", database = "shared-kb", }, }, inheritance = { base = "surrealdb://organization/shared-kb", # Inherit from shared priority = 100, # Project overrides shared }, } ``` **Sync Strategy**: ```text .kogral/ (Filesystem) ↓ [on save] Watch for changes ↓ [debounced] Sync to SurrealDB ↓ Shared graph updated ↓ [on query] Merge local + shared results ``` --- ## Consequences ### Positive ✅ **Developer Workflow Preserved**: ```bash # Local knowledge workflow (unchanged) vim .kogral/notes/my-note.md git add .kogral/notes/my-note.md git commit -m "Add implementation note" git push ``` ✅ **Git Integration**: - Project knowledge versioned with code - Branches include relevant knowledge - Merges resolve knowledge conflicts - PR reviews include knowledge changes ✅ **Offline Development**: - Full functionality without network - Shared guidelines cached locally - Sync when reconnected ✅ **Scalability**: - Projects: filesystem (100s of nodes, fine performance) - Organization: SurrealDB (10,000+ nodes, excellent performance) ✅ **Collaboration**: - Shared guidelines accessible to all projects - Updates to shared knowledge propagate automatically - Consistent practices across organization ✅ **Cost-Effective**: - Small projects: free (filesystem only) - Organizations: SurrealDB for shared only (not all project knowledge) ✅ **Gradual Adoption**: - Start with filesystem only - Add SurrealDB when needed - Feature-gated (`--features surrealdb`) ### Negative ❌ **Complexity**: - Two storage implementations - Sync mechanism required - Conflict resolution needed **Mitigation**: - Storage trait abstracts differences - Sync is optional (can disable) - Conflicts rare (guidelines change infrequently) ❌ **Sync Latency**: - Changes to shared KOGRAL not instant in all projects **Mitigation**: - Acceptable latency (guidelines don't change rapidly) - Manual sync command available (`kogral sync`) - Auto-sync on query (fetch latest) ❌ **Infrastructure Requirement**: - SurrealDB server needed for shared KOGRAL **Mitigation**: - Optional (can use synced filesystem instead) - Docker Compose for easy setup - Managed SurrealDB Cloud option ### Neutral ⚪ **Storage Trait Implementation**: ```rust #[async_trait] pub trait Storage { async fn save_graph(&self, graph: &Graph) -> Result<()>; async fn load_graph(&self, name: &str) -> Result; async fn list_graphs(&self) -> Result>; } // Three implementations impl Storage for FilesystemStorage { /* ... */ } impl Storage for SurrealDbStorage { /* ... */ } impl Storage for MemoryStorage { /* ... */ } ``` Abstraction makes multi-backend manageable. --- ## Use Cases ### Small Project (Solo Developer) **Config**: ```nickel { storage = { primary = 'filesystem } } ``` **Behavior**: - All knowledge in `.kogral/` directory - Git-tracked with code - No database required - Works offline ### Medium Project (Team) **Config**: ```nickel { storage = { primary = 'filesystem, secondary = { enabled = true, type = 'surrealdb, url = "ws://team-kb.local:8000", }, }, } ``` **Behavior**: - Project knowledge in `.kogral/` (git-tracked) - Shared team patterns in SurrealDB - Automatic sync - Offline fallback to cached ### Large Organization **Config**: ```nickel { storage = { primary = 'filesystem, secondary = { enabled = true, type = 'surrealdb, url = "ws://kb.company.com:8000", namespace = "engineering", database = "shared-kb", }, }, inheritance = { base = "surrealdb://engineering/shared-kb", guidelines = [ "surrealdb://engineering/rust-guidelines", "surrealdb://engineering/security-policies", ], }, } ``` **Behavior**: - Project-specific in `.kogral/` - Organization guidelines in SurrealDB - Security policies enforced - Automatic guideline updates --- ## Sync Mechanism ### Filesystem → SurrealDB **Trigger**: File changes detected (via `notify` crate) **Process**: 1. Parse changed markdown file 2. Convert to Node struct 3. Upsert to SurrealDB 4. Update relationships **Debouncing**: 500ms (configurable) ### SurrealDB → Filesystem **Trigger**: Query for shared knowledge **Process**: 1. Query SurrealDB for shared nodes 2. Cache locally (in-memory or filesystem) 3. Merge with local results 4. Return combined **Caching**: TTL-based (5 minutes default) ### Conflict Resolution **Strategy**: Last-write-wins with version tracking **Example**: ```text Project A: Updates shared guideline (v1 → v2) Project B: Has cached v1 On Project B query: - Detects v2 available - Fetches v2 - Updates cache - Uses v2 going forward ``` --- ## Alternatives Considered ### Git Submodules for Shared Knowledge **Rejected**: Cumbersome workflow - Requires manual submodule update - Merge conflicts in shared submodule - Not discoverable (need to know submodule exists) ### Syncthing for Filesystem Sync **Rejected**: Not designed for this use case - No query optimization - No relationship indexes - Sync conflicts difficult to resolve ### PostgreSQL with JSON **Rejected**: Not a graph database - Poor graph query performance - Relationship traversal requires complex SQL joins - No native graph features --- ## Migration Path ### Phase 1: Filesystem Only (Current) - All storage via filesystem - Git-tracked - No database required ### Phase 2: Optional SurrealDB - Add SurrealDB support (feature-gated) - Manual sync command - Shared KB opt-in ### Phase 3: Automatic Sync - File watching - Auto-sync on changes - Background sync ### Phase 4: Multi-Tenant SurrealDB - Organization namespaces - Access control - Audit logs --- ## Monitoring **Success Criteria**: - Developers don't notice hybrid complexity - Sync completes < 1 second for typical changes - Shared guidelines accessible in < 100ms - Zero data loss in sync **Metrics**: - Sync latency (P50, P95, P99) - Cache hit rate (shared knowledge) - Conflict rate (expect < 0.1%) - User satisfaction --- ## References - [SurrealDB Documentation](https://surrealdb.com/docs) - [Storage Trait Implementation](../../crates/kogral-core/src/storage/mod.rs) - [FilesystemStorage](../../crates/kogral-core/src/storage/filesystem.rs) - [SurrealDbStorage](../../crates/kogral-core/src/storage/surrealdb.rs) - [Sync Mechanism](../../scripts/kogral-sync.nu) --- ## Revision History | Date | Author | Change | | ---------- | ------------------ | ---------------- | | 2026-01-17 | Architecture Team | Initial decision | --- **Previous ADR**: [ADR-002: FastEmbed via AI Providers](002-fastembed-ai-providers.md) **Next ADR**: [ADR-004: Logseq Compatibility](004-logseq-compatibility.md)