11 KiB
ADR-003: Hybrid Storage Strategy
Status: Accepted
Date: 2026-01-17
Deciders: Architecture Team
Context: Storage Backend Strategy for Knowledge Base
Context
The KOGRAL needs to store knowledge graphs with these requirements:
- Git-Friendly: Knowledge should version alongside code
- Scalable: Support small projects (10s of nodes) to large organizations (10,000+ nodes)
- Queryable: Efficient graph queries and relationship traversal
- Offline-Capable: Work without network access
- Collaborative: Support shared organizational knowledge
- Cost-Effective: Free for small projects, reasonable cost at scale
Constraints:
- Developers want to edit knowledge in text editors
- Organizations want centralized guideline management
- Git workflows essential for code-adjacent knowledge
- Large graphs need database performance
Option 1: Filesystem Only
Approach: Store everything as markdown files
Pros:
- ✅ Git-native (perfect for versioning)
- ✅ Text editor friendly
- ✅ No dependencies
- ✅ Works offline
- ✅ Free
Cons:
- ❌ Poor performance for large graphs (100 0+ nodes)
- ❌ No efficient graph queries
- ❌ Difficult to share across projects
- ❌ Manual sync for collaboration
Scalability: Good for < 100 nodes, poor beyond
Option 2: Database Only (SurrealDB)
Approach: Store all knowledge in SurrealDB graph database
Pros:
- ✅ Excellent query performance
- ✅ Native graph relationships
- ✅ Scalable to millions of nodes
- ✅ Centralized for collaboration
Cons:
- ❌ Not git-trackable
- ❌ Requires running database server
- ❌ Can't edit with text editor
- ❌ Network dependency
- ❌ Infrastructure cost
Scalability: Excellent, but loses developer workflow benefits
Option 3: Hybrid (Filesystem + SurrealDB)
Approach: Filesystem for local project knowledge, SurrealDB for shared organizational knowledge
Pros:
- ✅ Git-friendly for project knowledge
- ✅ Text editor friendly
- ✅ Scalable for shared knowledge
- ✅ Works offline (local graph)
- ✅ Collaborative (shared graph)
- ✅ Cost-effective (DB only for shared)
Cons:
- ❌ More complex implementation
- ❌ Sync mechanism needed
- ❌ Two storage systems to manage
Scalability: Excellent - best of both worlds
Decision
We will use a hybrid storage strategy: Filesystem (local) + SurrealDB (shared).
Architecture:
┌─────────────────────────────────────────────────────────────┐
│ Project A (.kogral/) │
│ Storage: Filesystem (git-tracked) │
│ Scope: Project-specific notes, decisions, patterns │
│ Access: Local only │
└──────────────────┬──────────────────────────────────────────┘
│
│ [inherits]
↓
┌─────────────────────────────────────────────────────────────┐
│ Shared KB (SurrealDB or synced filesystem) │
│ Storage: SurrealDB (scalable) or filesystem (synced) │
│ Scope: Organization-wide guidelines, patterns │
│ Access: All projects │
└─────────────────────────────────────────────────────────────┘
Implementation:
# Project config
{
storage = {
primary = 'filesystem, # Local project knowledge
secondary = {
enabled = true,
type = 'surrealdb, # Shared knowledge
url = "ws://kb-central.company.com:8000",
namespace = "organization",
database = "shared-kb",
},
},
inheritance = {
base = "surrealdb://organization/shared-kb", # Inherit from shared
priority = 100, # Project overrides shared
},
}
Sync Strategy:
.kogral/ (Filesystem)
↓ [on save]
Watch for changes
↓ [debounced]
Sync to SurrealDB
↓
Shared graph updated
↓ [on query]
Merge local + shared results
Consequences
Positive
✅ Developer Workflow Preserved:
# Local knowledge workflow (unchanged)
vim .kogral/notes/my-note.md
git add .kogral/notes/my-note.md
git commit -m "Add implementation note"
git push
✅ Git Integration:
- Project knowledge versioned with code
- Branches include relevant knowledge
- Merges resolve knowledge conflicts
- PR reviews include knowledge changes
✅ Offline Development:
- Full functionality without network
- Shared guidelines cached locally
- Sync when reconnected
✅ Scalability:
- Projects: filesystem (100s of nodes, fine performance)
- Organization: SurrealDB (10,000+ nodes, excellent performance)
✅ Collaboration:
- Shared guidelines accessible to all projects
- Updates to shared knowledge propagate automatically
- Consistent practices across organization
✅ Cost-Effective:
- Small projects: free (filesystem only)
- Organizations: SurrealDB for shared only (not all project knowledge)
✅ Gradual Adoption:
- Start with filesystem only
- Add SurrealDB when needed
- Feature-gated (
--features surrealdb)
Negative
❌ Complexity:
- Two storage implementations
- Sync mechanism required
- Conflict resolution needed
Mitigation:
- Storage trait abstracts differences
- Sync is optional (can disable)
- Conflicts rare (guidelines change infrequently)
❌ Sync Latency:
- Changes to shared KOGRAL not instant in all projects
Mitigation:
- Acceptable latency (guidelines don't change rapidly)
- Manual sync command available (
kogral sync) - Auto-sync on query (fetch latest)
❌ Infrastructure Requirement:
- SurrealDB server needed for shared KOGRAL
Mitigation:
- Optional (can use synced filesystem instead)
- Docker Compose for easy setup
- Managed SurrealDB Cloud option
Neutral
⚪ Storage Trait Implementation:
#[async_trait]
pub trait Storage {
async fn save_graph(&self, graph: &Graph) -> Result<()>;
async fn load_graph(&self, name: &str) -> Result<Graph>;
async fn list_graphs(&self) -> Result<Vec<String>>;
}
// Three implementations
impl Storage for FilesystemStorage { /* ... */ }
impl Storage for SurrealDbStorage { /* ... */ }
impl Storage for MemoryStorage { /* ... */ }
Abstraction makes multi-backend manageable.
Use Cases
Small Project (Solo Developer)
Config:
{ storage = { primary = 'filesystem } }
Behavior:
- All knowledge in
.kogral/directory - Git-tracked with code
- No database required
- Works offline
Medium Project (Team)
Config:
{
storage = {
primary = 'filesystem,
secondary = {
enabled = true,
type = 'surrealdb,
url = "ws://team-kb.local:8000",
},
},
}
Behavior:
- Project knowledge in
.kogral/(git-tracked) - Shared team patterns in SurrealDB
- Automatic sync
- Offline fallback to cached
Large Organization
Config:
{
storage = {
primary = 'filesystem,
secondary = {
enabled = true,
type = 'surrealdb,
url = "ws://kb.company.com:8000",
namespace = "engineering",
database = "shared-kb",
},
},
inheritance = {
base = "surrealdb://engineering/shared-kb",
guidelines = [
"surrealdb://engineering/rust-guidelines",
"surrealdb://engineering/security-policies",
],
},
}
Behavior:
- Project-specific in
.kogral/ - Organization guidelines in SurrealDB
- Security policies enforced
- Automatic guideline updates
Sync Mechanism
Filesystem → SurrealDB
Trigger: File changes detected (via notify crate)
Process:
- Parse changed markdown file
- Convert to Node struct
- Upsert to SurrealDB
- Update relationships
Debouncing: 500ms (configurable)
SurrealDB → Filesystem
Trigger: Query for shared knowledge
Process:
- Query SurrealDB for shared nodes
- Cache locally (in-memory or filesystem)
- Merge with local results
- Return combined
Caching: TTL-based (5 minutes default)
Conflict Resolution
Strategy: Last-write-wins with version tracking
Example:
Project A: Updates shared guideline (v1 → v2)
Project B: Has cached v1
On Project B query:
- Detects v2 available
- Fetches v2
- Updates cache
- Uses v2 going forward
Alternatives Considered
Git Submodules for Shared Knowledge
Rejected: Cumbersome workflow
- Requires manual submodule update
- Merge conflicts in shared submodule
- Not discoverable (need to know submodule exists)
Syncthing for Filesystem Sync
Rejected: Not designed for this use case
- No query optimization
- No relationship indexes
- Sync conflicts difficult to resolve
PostgreSQL with JSON
Rejected: Not a graph database
- Poor graph query performance
- Relationship traversal requires complex SQL joins
- No native graph features
Migration Path
Phase 1: Filesystem Only (Current)
- All storage via filesystem
- Git-tracked
- No database required
Phase 2: Optional SurrealDB
- Add SurrealDB support (feature-gated)
- Manual sync command
- Shared KB opt-in
Phase 3: Automatic Sync
- File watching
- Auto-sync on changes
- Background sync
Phase 4: Multi-Tenant SurrealDB
- Organization namespaces
- Access control
- Audit logs
Monitoring
Success Criteria:
- Developers don't notice hybrid complexity
- Sync completes < 1 second for typical changes
- Shared guidelines accessible in < 100ms
- Zero data loss in sync
Metrics:
- Sync latency (P50, P95, P99)
- Cache hit rate (shared knowledge)
- Conflict rate (expect < 0.1%)
- User satisfaction
References
- SurrealDB Documentation
- Storage Trait Implementation
- FilesystemStorage
- SurrealDbStorage
- Sync Mechanism
Revision History
| Date | Author | Change |
|---|---|---|
| 2026-01-17 | Architecture Team | Initial decision |
Previous ADR: ADR-002: FastEmbed via AI Providers Next ADR: ADR-004: Logseq Compatibility