kogral/docs/architecture/adrs/003-hybrid-storage.md

450 lines
11 KiB
Markdown
Raw Normal View History

2026-01-23 16:11:07 +00:00
# ADR-003: Hybrid Storage Strategy
**Status**: Accepted
**Date**: 2026-01-17
**Deciders**: Architecture Team
**Context**: Storage Backend Strategy for Knowledge Base
---
## Context
The KOGRAL needs to store knowledge graphs with these requirements:
1. **Git-Friendly**: Knowledge should version alongside code
2. **Scalable**: Support small projects (10s of nodes) to large organizations (10,000+ nodes)
3. **Queryable**: Efficient graph queries and relationship traversal
4. **Offline-Capable**: Work without network access
5. **Collaborative**: Support shared organizational knowledge
6. **Cost-Effective**: Free for small projects, reasonable cost at scale
**Constraints**:
- Developers want to edit knowledge in text editors
- Organizations want centralized guideline management
- Git workflows essential for code-adjacent knowledge
- Large graphs need database performance
### Option 1: Filesystem Only
**Approach**: Store everything as markdown files
**Pros**:
- ✅ Git-native (perfect for versioning)
- ✅ Text editor friendly
- ✅ No dependencies
- ✅ Works offline
- ✅ Free
**Cons**:
- ❌ Poor performance for large graphs (100 0+ nodes)
- ❌ No efficient graph queries
- ❌ Difficult to share across projects
- ❌ Manual sync for collaboration
**Scalability**: Good for < 100 nodes, poor beyond
### Option 2: Database Only (SurrealDB)
**Approach**: Store all knowledge in SurrealDB graph database
**Pros**:
- ✅ Excellent query performance
- ✅ Native graph relationships
- ✅ Scalable to millions of nodes
- ✅ Centralized for collaboration
**Cons**:
- ❌ Not git-trackable
- ❌ Requires running database server
- ❌ Can't edit with text editor
- ❌ Network dependency
- ❌ Infrastructure cost
**Scalability**: Excellent, but loses developer workflow benefits
### Option 3: Hybrid (Filesystem + SurrealDB)
**Approach**: Filesystem for local project knowledge, SurrealDB for shared organizational knowledge
**Pros**:
- ✅ Git-friendly for project knowledge
- ✅ Text editor friendly
- ✅ Scalable for shared knowledge
- ✅ Works offline (local graph)
- ✅ Collaborative (shared graph)
- ✅ Cost-effective (DB only for shared)
**Cons**:
- ❌ More complex implementation
- ❌ Sync mechanism needed
- ❌ Two storage systems to manage
**Scalability**: Excellent - best of both worlds
---
## Decision
**We will use a hybrid storage strategy: Filesystem (local) + SurrealDB (shared).**
**Architecture**:
```text
┌─────────────────────────────────────────────────────────────┐
│ Project A (.kogral/) │
│ Storage: Filesystem (git-tracked) │
│ Scope: Project-specific notes, decisions, patterns │
│ Access: Local only │
└──────────────────┬──────────────────────────────────────────┘
│ [inherits]
┌─────────────────────────────────────────────────────────────┐
│ Shared KB (SurrealDB or synced filesystem) │
│ Storage: SurrealDB (scalable) or filesystem (synced) │
│ Scope: Organization-wide guidelines, patterns │
│ Access: All projects │
└─────────────────────────────────────────────────────────────┘
```
**Implementation**:
```nickel
# Project config
{
storage = {
primary = 'filesystem, # Local project knowledge
secondary = {
enabled = true,
type = 'surrealdb, # Shared knowledge
url = "ws://kb-central.company.com:8000",
namespace = "organization",
database = "shared-kb",
},
},
inheritance = {
base = "surrealdb://organization/shared-kb", # Inherit from shared
priority = 100, # Project overrides shared
},
}
```
**Sync Strategy**:
```text
.kogral/ (Filesystem)
↓ [on save]
Watch for changes
↓ [debounced]
Sync to SurrealDB
Shared graph updated
↓ [on query]
Merge local + shared results
```
---
## Consequences
### Positive
**Developer Workflow Preserved**:
```bash
# Local knowledge workflow (unchanged)
vim .kogral/notes/my-note.md
git add .kogral/notes/my-note.md
git commit -m "Add implementation note"
git push
```
**Git Integration**:
- Project knowledge versioned with code
- Branches include relevant knowledge
- Merges resolve knowledge conflicts
- PR reviews include knowledge changes
**Offline Development**:
- Full functionality without network
- Shared guidelines cached locally
- Sync when reconnected
**Scalability**:
- Projects: filesystem (100s of nodes, fine performance)
- Organization: SurrealDB (10,000+ nodes, excellent performance)
**Collaboration**:
- Shared guidelines accessible to all projects
- Updates to shared knowledge propagate automatically
- Consistent practices across organization
**Cost-Effective**:
- Small projects: free (filesystem only)
- Organizations: SurrealDB for shared only (not all project knowledge)
**Gradual Adoption**:
- Start with filesystem only
- Add SurrealDB when needed
- Feature-gated (`--features surrealdb`)
### Negative
**Complexity**:
- Two storage implementations
- Sync mechanism required
- Conflict resolution needed
**Mitigation**:
- Storage trait abstracts differences
- Sync is optional (can disable)
- Conflicts rare (guidelines change infrequently)
**Sync Latency**:
- Changes to shared KOGRAL not instant in all projects
**Mitigation**:
- Acceptable latency (guidelines don't change rapidly)
- Manual sync command available (`kogral sync`)
- Auto-sync on query (fetch latest)
**Infrastructure Requirement**:
- SurrealDB server needed for shared KOGRAL
**Mitigation**:
- Optional (can use synced filesystem instead)
- Docker Compose for easy setup
- Managed SurrealDB Cloud option
### Neutral
**Storage Trait Implementation**:
```rust
#[async_trait]
pub trait Storage {
async fn save_graph(&self, graph: &Graph) -> Result<()>;
async fn load_graph(&self, name: &str) -> Result<Graph>;
async fn list_graphs(&self) -> Result<Vec<String>>;
}
// Three implementations
impl Storage for FilesystemStorage { /* ... */ }
impl Storage for SurrealDbStorage { /* ... */ }
impl Storage for MemoryStorage { /* ... */ }
```
Abstraction makes multi-backend manageable.
---
## Use Cases
### Small Project (Solo Developer)
**Config**:
```nickel
{ storage = { primary = 'filesystem } }
```
**Behavior**:
- All knowledge in `.kogral/` directory
- Git-tracked with code
- No database required
- Works offline
### Medium Project (Team)
**Config**:
```nickel
{
storage = {
primary = 'filesystem,
secondary = {
enabled = true,
type = 'surrealdb,
url = "ws://team-kb.local:8000",
},
},
}
```
**Behavior**:
- Project knowledge in `.kogral/` (git-tracked)
- Shared team patterns in SurrealDB
- Automatic sync
- Offline fallback to cached
### Large Organization
**Config**:
```nickel
{
storage = {
primary = 'filesystem,
secondary = {
enabled = true,
type = 'surrealdb,
url = "ws://kb.company.com:8000",
namespace = "engineering",
database = "shared-kb",
},
},
inheritance = {
base = "surrealdb://engineering/shared-kb",
guidelines = [
"surrealdb://engineering/rust-guidelines",
"surrealdb://engineering/security-policies",
],
},
}
```
**Behavior**:
- Project-specific in `.kogral/`
- Organization guidelines in SurrealDB
- Security policies enforced
- Automatic guideline updates
---
## Sync Mechanism
### Filesystem → SurrealDB
**Trigger**: File changes detected (via `notify` crate)
**Process**:
1. Parse changed markdown file
2. Convert to Node struct
3. Upsert to SurrealDB
4. Update relationships
**Debouncing**: 500ms (configurable)
### SurrealDB → Filesystem
**Trigger**: Query for shared knowledge
**Process**:
1. Query SurrealDB for shared nodes
2. Cache locally (in-memory or filesystem)
3. Merge with local results
4. Return combined
**Caching**: TTL-based (5 minutes default)
### Conflict Resolution
**Strategy**: Last-write-wins with version tracking
**Example**:
```text
Project A: Updates shared guideline (v1 → v2)
Project B: Has cached v1
On Project B query:
- Detects v2 available
- Fetches v2
- Updates cache
- Uses v2 going forward
```
---
## Alternatives Considered
### Git Submodules for Shared Knowledge
**Rejected**: Cumbersome workflow
- Requires manual submodule update
- Merge conflicts in shared submodule
- Not discoverable (need to know submodule exists)
### Syncthing for Filesystem Sync
**Rejected**: Not designed for this use case
- No query optimization
- No relationship indexes
- Sync conflicts difficult to resolve
### PostgreSQL with JSON
**Rejected**: Not a graph database
- Poor graph query performance
- Relationship traversal requires complex SQL joins
- No native graph features
---
## Migration Path
### Phase 1: Filesystem Only (Current)
- All storage via filesystem
- Git-tracked
- No database required
### Phase 2: Optional SurrealDB
- Add SurrealDB support (feature-gated)
- Manual sync command
- Shared KB opt-in
### Phase 3: Automatic Sync
- File watching
- Auto-sync on changes
- Background sync
### Phase 4: Multi-Tenant SurrealDB
- Organization namespaces
- Access control
- Audit logs
---
## Monitoring
**Success Criteria**:
- Developers don't notice hybrid complexity
- Sync completes < 1 second for typical changes
- Shared guidelines accessible in < 100ms
- Zero data loss in sync
**Metrics**:
- Sync latency (P50, P95, P99)
- Cache hit rate (shared knowledge)
- Conflict rate (expect < 0.1%)
- User satisfaction
---
## References
- [SurrealDB Documentation](https://surrealdb.com/docs)
- [Storage Trait Implementation](../../crates/kogral-core/src/storage/mod.rs)
- [FilesystemStorage](../../crates/kogral-core/src/storage/filesystem.rs)
- [SurrealDbStorage](../../crates/kogral-core/src/storage/surrealdb.rs)
- [Sync Mechanism](../../scripts/kogral-sync.nu)
---
## Revision History
| Date | Author | Change |
| ---------- | ------------------ | ---------------- |
| 2026-01-17 | Architecture Team | Initial decision |
---
**Previous ADR**: [ADR-002: FastEmbed via AI Providers](002-fastembed-ai-providers.md)
**Next ADR**: [ADR-004: Logseq Compatibility](004-logseq-compatibility.md)