450 lines
11 KiB
Markdown
450 lines
11 KiB
Markdown
|
|
# ADR-003: Hybrid Storage Strategy
|
||
|
|
|
||
|
|
**Status**: Accepted
|
||
|
|
|
||
|
|
**Date**: 2026-01-17
|
||
|
|
|
||
|
|
**Deciders**: Architecture Team
|
||
|
|
|
||
|
|
**Context**: Storage Backend Strategy for Knowledge Base
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Context
|
||
|
|
|
||
|
|
The KOGRAL needs to store knowledge graphs with these requirements:
|
||
|
|
|
||
|
|
1. **Git-Friendly**: Knowledge should version alongside code
|
||
|
|
2. **Scalable**: Support small projects (10s of nodes) to large organizations (10,000+ nodes)
|
||
|
|
3. **Queryable**: Efficient graph queries and relationship traversal
|
||
|
|
4. **Offline-Capable**: Work without network access
|
||
|
|
5. **Collaborative**: Support shared organizational knowledge
|
||
|
|
6. **Cost-Effective**: Free for small projects, reasonable cost at scale
|
||
|
|
|
||
|
|
**Constraints**:
|
||
|
|
|
||
|
|
- Developers want to edit knowledge in text editors
|
||
|
|
- Organizations want centralized guideline management
|
||
|
|
- Git workflows essential for code-adjacent knowledge
|
||
|
|
- Large graphs need database performance
|
||
|
|
|
||
|
|
### Option 1: Filesystem Only
|
||
|
|
|
||
|
|
**Approach**: Store everything as markdown files
|
||
|
|
|
||
|
|
**Pros**:
|
||
|
|
- ✅ Git-native (perfect for versioning)
|
||
|
|
- ✅ Text editor friendly
|
||
|
|
- ✅ No dependencies
|
||
|
|
- ✅ Works offline
|
||
|
|
- ✅ Free
|
||
|
|
|
||
|
|
**Cons**:
|
||
|
|
- ❌ Poor performance for large graphs (100 0+ nodes)
|
||
|
|
- ❌ No efficient graph queries
|
||
|
|
- ❌ Difficult to share across projects
|
||
|
|
- ❌ Manual sync for collaboration
|
||
|
|
|
||
|
|
**Scalability**: Good for < 100 nodes, poor beyond
|
||
|
|
|
||
|
|
### Option 2: Database Only (SurrealDB)
|
||
|
|
|
||
|
|
**Approach**: Store all knowledge in SurrealDB graph database
|
||
|
|
|
||
|
|
**Pros**:
|
||
|
|
- ✅ Excellent query performance
|
||
|
|
- ✅ Native graph relationships
|
||
|
|
- ✅ Scalable to millions of nodes
|
||
|
|
- ✅ Centralized for collaboration
|
||
|
|
|
||
|
|
**Cons**:
|
||
|
|
- ❌ Not git-trackable
|
||
|
|
- ❌ Requires running database server
|
||
|
|
- ❌ Can't edit with text editor
|
||
|
|
- ❌ Network dependency
|
||
|
|
- ❌ Infrastructure cost
|
||
|
|
|
||
|
|
**Scalability**: Excellent, but loses developer workflow benefits
|
||
|
|
|
||
|
|
### Option 3: Hybrid (Filesystem + SurrealDB)
|
||
|
|
|
||
|
|
**Approach**: Filesystem for local project knowledge, SurrealDB for shared organizational knowledge
|
||
|
|
|
||
|
|
**Pros**:
|
||
|
|
- ✅ Git-friendly for project knowledge
|
||
|
|
- ✅ Text editor friendly
|
||
|
|
- ✅ Scalable for shared knowledge
|
||
|
|
- ✅ Works offline (local graph)
|
||
|
|
- ✅ Collaborative (shared graph)
|
||
|
|
- ✅ Cost-effective (DB only for shared)
|
||
|
|
|
||
|
|
**Cons**:
|
||
|
|
- ❌ More complex implementation
|
||
|
|
- ❌ Sync mechanism needed
|
||
|
|
- ❌ Two storage systems to manage
|
||
|
|
|
||
|
|
**Scalability**: Excellent - best of both worlds
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Decision
|
||
|
|
|
||
|
|
**We will use a hybrid storage strategy: Filesystem (local) + SurrealDB (shared).**
|
||
|
|
|
||
|
|
**Architecture**:
|
||
|
|
|
||
|
|
```text
|
||
|
|
┌─────────────────────────────────────────────────────────────┐
|
||
|
|
│ Project A (.kogral/) │
|
||
|
|
│ Storage: Filesystem (git-tracked) │
|
||
|
|
│ Scope: Project-specific notes, decisions, patterns │
|
||
|
|
│ Access: Local only │
|
||
|
|
└──────────────────┬──────────────────────────────────────────┘
|
||
|
|
│
|
||
|
|
│ [inherits]
|
||
|
|
↓
|
||
|
|
┌─────────────────────────────────────────────────────────────┐
|
||
|
|
│ Shared KB (SurrealDB or synced filesystem) │
|
||
|
|
│ Storage: SurrealDB (scalable) or filesystem (synced) │
|
||
|
|
│ Scope: Organization-wide guidelines, patterns │
|
||
|
|
│ Access: All projects │
|
||
|
|
└─────────────────────────────────────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
**Implementation**:
|
||
|
|
|
||
|
|
```nickel
|
||
|
|
# Project config
|
||
|
|
{
|
||
|
|
storage = {
|
||
|
|
primary = 'filesystem, # Local project knowledge
|
||
|
|
secondary = {
|
||
|
|
enabled = true,
|
||
|
|
type = 'surrealdb, # Shared knowledge
|
||
|
|
url = "ws://kb-central.company.com:8000",
|
||
|
|
namespace = "organization",
|
||
|
|
database = "shared-kb",
|
||
|
|
},
|
||
|
|
},
|
||
|
|
|
||
|
|
inheritance = {
|
||
|
|
base = "surrealdb://organization/shared-kb", # Inherit from shared
|
||
|
|
priority = 100, # Project overrides shared
|
||
|
|
},
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Sync Strategy**:
|
||
|
|
|
||
|
|
```text
|
||
|
|
.kogral/ (Filesystem)
|
||
|
|
↓ [on save]
|
||
|
|
Watch for changes
|
||
|
|
↓ [debounced]
|
||
|
|
Sync to SurrealDB
|
||
|
|
↓
|
||
|
|
Shared graph updated
|
||
|
|
↓ [on query]
|
||
|
|
Merge local + shared results
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Consequences
|
||
|
|
|
||
|
|
### Positive
|
||
|
|
|
||
|
|
✅ **Developer Workflow Preserved**:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Local knowledge workflow (unchanged)
|
||
|
|
vim .kogral/notes/my-note.md
|
||
|
|
git add .kogral/notes/my-note.md
|
||
|
|
git commit -m "Add implementation note"
|
||
|
|
git push
|
||
|
|
```
|
||
|
|
|
||
|
|
✅ **Git Integration**:
|
||
|
|
- Project knowledge versioned with code
|
||
|
|
- Branches include relevant knowledge
|
||
|
|
- Merges resolve knowledge conflicts
|
||
|
|
- PR reviews include knowledge changes
|
||
|
|
|
||
|
|
✅ **Offline Development**:
|
||
|
|
- Full functionality without network
|
||
|
|
- Shared guidelines cached locally
|
||
|
|
- Sync when reconnected
|
||
|
|
|
||
|
|
✅ **Scalability**:
|
||
|
|
- Projects: filesystem (100s of nodes, fine performance)
|
||
|
|
- Organization: SurrealDB (10,000+ nodes, excellent performance)
|
||
|
|
|
||
|
|
✅ **Collaboration**:
|
||
|
|
- Shared guidelines accessible to all projects
|
||
|
|
- Updates to shared knowledge propagate automatically
|
||
|
|
- Consistent practices across organization
|
||
|
|
|
||
|
|
✅ **Cost-Effective**:
|
||
|
|
- Small projects: free (filesystem only)
|
||
|
|
- Organizations: SurrealDB for shared only (not all project knowledge)
|
||
|
|
|
||
|
|
✅ **Gradual Adoption**:
|
||
|
|
- Start with filesystem only
|
||
|
|
- Add SurrealDB when needed
|
||
|
|
- Feature-gated (`--features surrealdb`)
|
||
|
|
|
||
|
|
### Negative
|
||
|
|
|
||
|
|
❌ **Complexity**:
|
||
|
|
- Two storage implementations
|
||
|
|
- Sync mechanism required
|
||
|
|
- Conflict resolution needed
|
||
|
|
|
||
|
|
**Mitigation**:
|
||
|
|
- Storage trait abstracts differences
|
||
|
|
- Sync is optional (can disable)
|
||
|
|
- Conflicts rare (guidelines change infrequently)
|
||
|
|
|
||
|
|
❌ **Sync Latency**:
|
||
|
|
- Changes to shared KOGRAL not instant in all projects
|
||
|
|
|
||
|
|
**Mitigation**:
|
||
|
|
- Acceptable latency (guidelines don't change rapidly)
|
||
|
|
- Manual sync command available (`kogral sync`)
|
||
|
|
- Auto-sync on query (fetch latest)
|
||
|
|
|
||
|
|
❌ **Infrastructure Requirement**:
|
||
|
|
- SurrealDB server needed for shared KOGRAL
|
||
|
|
|
||
|
|
**Mitigation**:
|
||
|
|
- Optional (can use synced filesystem instead)
|
||
|
|
- Docker Compose for easy setup
|
||
|
|
- Managed SurrealDB Cloud option
|
||
|
|
|
||
|
|
### Neutral
|
||
|
|
|
||
|
|
⚪ **Storage Trait Implementation**:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
#[async_trait]
|
||
|
|
pub trait Storage {
|
||
|
|
async fn save_graph(&self, graph: &Graph) -> Result<()>;
|
||
|
|
async fn load_graph(&self, name: &str) -> Result<Graph>;
|
||
|
|
async fn list_graphs(&self) -> Result<Vec<String>>;
|
||
|
|
}
|
||
|
|
|
||
|
|
// Three implementations
|
||
|
|
impl Storage for FilesystemStorage { /* ... */ }
|
||
|
|
impl Storage for SurrealDbStorage { /* ... */ }
|
||
|
|
impl Storage for MemoryStorage { /* ... */ }
|
||
|
|
```
|
||
|
|
|
||
|
|
Abstraction makes multi-backend manageable.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Use Cases
|
||
|
|
|
||
|
|
### Small Project (Solo Developer)
|
||
|
|
|
||
|
|
**Config**:
|
||
|
|
|
||
|
|
```nickel
|
||
|
|
{ storage = { primary = 'filesystem } }
|
||
|
|
```
|
||
|
|
|
||
|
|
**Behavior**:
|
||
|
|
- All knowledge in `.kogral/` directory
|
||
|
|
- Git-tracked with code
|
||
|
|
- No database required
|
||
|
|
- Works offline
|
||
|
|
|
||
|
|
### Medium Project (Team)
|
||
|
|
|
||
|
|
**Config**:
|
||
|
|
|
||
|
|
```nickel
|
||
|
|
{
|
||
|
|
storage = {
|
||
|
|
primary = 'filesystem,
|
||
|
|
secondary = {
|
||
|
|
enabled = true,
|
||
|
|
type = 'surrealdb,
|
||
|
|
url = "ws://team-kb.local:8000",
|
||
|
|
},
|
||
|
|
},
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Behavior**:
|
||
|
|
- Project knowledge in `.kogral/` (git-tracked)
|
||
|
|
- Shared team patterns in SurrealDB
|
||
|
|
- Automatic sync
|
||
|
|
- Offline fallback to cached
|
||
|
|
|
||
|
|
### Large Organization
|
||
|
|
|
||
|
|
**Config**:
|
||
|
|
|
||
|
|
```nickel
|
||
|
|
{
|
||
|
|
storage = {
|
||
|
|
primary = 'filesystem,
|
||
|
|
secondary = {
|
||
|
|
enabled = true,
|
||
|
|
type = 'surrealdb,
|
||
|
|
url = "ws://kb.company.com:8000",
|
||
|
|
namespace = "engineering",
|
||
|
|
database = "shared-kb",
|
||
|
|
},
|
||
|
|
},
|
||
|
|
|
||
|
|
inheritance = {
|
||
|
|
base = "surrealdb://engineering/shared-kb",
|
||
|
|
guidelines = [
|
||
|
|
"surrealdb://engineering/rust-guidelines",
|
||
|
|
"surrealdb://engineering/security-policies",
|
||
|
|
],
|
||
|
|
},
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Behavior**:
|
||
|
|
- Project-specific in `.kogral/`
|
||
|
|
- Organization guidelines in SurrealDB
|
||
|
|
- Security policies enforced
|
||
|
|
- Automatic guideline updates
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Sync Mechanism
|
||
|
|
|
||
|
|
### Filesystem → SurrealDB
|
||
|
|
|
||
|
|
**Trigger**: File changes detected (via `notify` crate)
|
||
|
|
|
||
|
|
**Process**:
|
||
|
|
1. Parse changed markdown file
|
||
|
|
2. Convert to Node struct
|
||
|
|
3. Upsert to SurrealDB
|
||
|
|
4. Update relationships
|
||
|
|
|
||
|
|
**Debouncing**: 500ms (configurable)
|
||
|
|
|
||
|
|
### SurrealDB → Filesystem
|
||
|
|
|
||
|
|
**Trigger**: Query for shared knowledge
|
||
|
|
|
||
|
|
**Process**:
|
||
|
|
1. Query SurrealDB for shared nodes
|
||
|
|
2. Cache locally (in-memory or filesystem)
|
||
|
|
3. Merge with local results
|
||
|
|
4. Return combined
|
||
|
|
|
||
|
|
**Caching**: TTL-based (5 minutes default)
|
||
|
|
|
||
|
|
### Conflict Resolution
|
||
|
|
|
||
|
|
**Strategy**: Last-write-wins with version tracking
|
||
|
|
|
||
|
|
**Example**:
|
||
|
|
|
||
|
|
```text
|
||
|
|
Project A: Updates shared guideline (v1 → v2)
|
||
|
|
Project B: Has cached v1
|
||
|
|
|
||
|
|
On Project B query:
|
||
|
|
- Detects v2 available
|
||
|
|
- Fetches v2
|
||
|
|
- Updates cache
|
||
|
|
- Uses v2 going forward
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Alternatives Considered
|
||
|
|
|
||
|
|
### Git Submodules for Shared Knowledge
|
||
|
|
|
||
|
|
**Rejected**: Cumbersome workflow
|
||
|
|
- Requires manual submodule update
|
||
|
|
- Merge conflicts in shared submodule
|
||
|
|
- Not discoverable (need to know submodule exists)
|
||
|
|
|
||
|
|
### Syncthing for Filesystem Sync
|
||
|
|
|
||
|
|
**Rejected**: Not designed for this use case
|
||
|
|
- No query optimization
|
||
|
|
- No relationship indexes
|
||
|
|
- Sync conflicts difficult to resolve
|
||
|
|
|
||
|
|
### PostgreSQL with JSON
|
||
|
|
|
||
|
|
**Rejected**: Not a graph database
|
||
|
|
- Poor graph query performance
|
||
|
|
- Relationship traversal requires complex SQL joins
|
||
|
|
- No native graph features
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Migration Path
|
||
|
|
|
||
|
|
### Phase 1: Filesystem Only (Current)
|
||
|
|
- All storage via filesystem
|
||
|
|
- Git-tracked
|
||
|
|
- No database required
|
||
|
|
|
||
|
|
### Phase 2: Optional SurrealDB
|
||
|
|
- Add SurrealDB support (feature-gated)
|
||
|
|
- Manual sync command
|
||
|
|
- Shared KB opt-in
|
||
|
|
|
||
|
|
### Phase 3: Automatic Sync
|
||
|
|
- File watching
|
||
|
|
- Auto-sync on changes
|
||
|
|
- Background sync
|
||
|
|
|
||
|
|
### Phase 4: Multi-Tenant SurrealDB
|
||
|
|
- Organization namespaces
|
||
|
|
- Access control
|
||
|
|
- Audit logs
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Monitoring
|
||
|
|
|
||
|
|
**Success Criteria**:
|
||
|
|
- Developers don't notice hybrid complexity
|
||
|
|
- Sync completes < 1 second for typical changes
|
||
|
|
- Shared guidelines accessible in < 100ms
|
||
|
|
- Zero data loss in sync
|
||
|
|
|
||
|
|
**Metrics**:
|
||
|
|
- Sync latency (P50, P95, P99)
|
||
|
|
- Cache hit rate (shared knowledge)
|
||
|
|
- Conflict rate (expect < 0.1%)
|
||
|
|
- User satisfaction
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- [SurrealDB Documentation](https://surrealdb.com/docs)
|
||
|
|
- [Storage Trait Implementation](../../crates/kogral-core/src/storage/mod.rs)
|
||
|
|
- [FilesystemStorage](../../crates/kogral-core/src/storage/filesystem.rs)
|
||
|
|
- [SurrealDbStorage](../../crates/kogral-core/src/storage/surrealdb.rs)
|
||
|
|
- [Sync Mechanism](../../scripts/kogral-sync.nu)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Revision History
|
||
|
|
|
||
|
|
| Date | Author | Change |
|
||
|
|
| ---------- | ------------------ | ---------------- |
|
||
|
|
| 2026-01-17 | Architecture Team | Initial decision |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Previous ADR**: [ADR-002: FastEmbed via AI Providers](002-fastembed-ai-providers.md)
|
||
|
|
**Next ADR**: [ADR-004: Logseq Compatibility](004-logseq-compatibility.md)
|