kogral/docs/architecture/logseq-blocks-design.md

936 lines
25 KiB
Markdown
Raw Permalink Normal View History

2026-01-23 16:11:07 +00:00
# Logseq Blocks Support - Architecture Design
## Problem Statement
Logseq uses **content blocks** as the fundamental unit of information, not full documents. Each block can have:
- **Properties**: `#card`, `TODO`, `DONE`, custom properties
- **Tags**: Inline tags like `#flashcard`, `#important`
- **References**: Block references `((block-id))`, page references `[[page]]`
- **Nesting**: Outliner-style hierarchy (parent-child blocks)
- **Metadata**: Block-level properties (unlike page-level frontmatter)
**Current KB limitation**: Nodes only have `content: String` (flat markdown). Importing from Logseq loses block structure and properties.
**Requirement**: Support round-trip import/export with full block fidelity:
```text
Logseq Graph → KOGRAL Import → KOGRAL Storage → KOGRAL Export → Logseq Graph
(blocks preserved) (blocks preserved)
```
## Use Cases
### 1. Flashcards (`#card`)
**Logseq**:
```markdown
- What is Rust's ownership model? #card
- Rust uses ownership, borrowing, and lifetimes
- Three rules: one owner, many borrows XOR one mutable
```
**KB needs to preserve**:
- Block with `#card` property
- Nested answer blocks
- Ability to query all cards
### 2. Task Tracking (`TODO`/`DONE`)
**Logseq**:
```markdown
- TODO Implement block parser #rust
- DONE Research block structure
- TODO Write parser tests
```
**KB needs to preserve**:
- Task status per block
- Hierarchical task breakdown
- Tags on tasks
### 3. Block References
**Logseq**:
```markdown
- Core concept: ((block-uuid-123))
- See also: [[Related Page]]
```
**KB needs to preserve**:
- Block-to-block links (not just page-to-page)
- UUID references
### 4. Block Properties
**Logseq**:
```markdown
- This is a block with properties
property1:: value1
property2:: value2
```
**KB needs to preserve**:
- Custom key-value properties per block
- Property inheritance/override
## Design Options
### Option A: Blocks as First-Class Data Structure
**Add `blocks` field to Node**:
```rust
pub struct Node {
// ... existing fields ...
pub content: String, // Backward compat: flat markdown
pub blocks: Option<Vec<Block>>, // NEW: Structured blocks
}
pub struct Block {
pub id: String, // UUID or auto-generated
pub content: String, // Block text
pub properties: BlockProperties, // Tags, status, custom props
pub children: Vec<Block>, // Nested blocks
pub created: DateTime<Utc>,
pub modified: DateTime<Utc>,
}
pub struct BlockProperties {
pub tags: Vec<String>, // #card, #important
pub status: Option<TaskStatus>, // TODO, DONE, WAITING
pub custom: HashMap<String, String>, // property:: value
}
pub enum TaskStatus {
Todo,
Doing,
Done,
Waiting,
Cancelled,
}
```
**Pros**:
- ✅ Type-safe, explicit structure
- ✅ Queryable (find all #card blocks)
- ✅ Preserves hierarchy
- ✅ Supports block-level operations
**Cons**:
- ❌ Adds complexity to Node
- ❌ Dual representation (content + blocks)
- ❌ Requires migration of existing data
### Option B: Parser-Only Approach
**Keep `content: String`, parse blocks on-demand**:
```rust
pub struct BlockParser;
impl BlockParser {
// Parse markdown content into block structure
fn parse(content: &str) -> Vec<Block>;
// Serialize blocks back to markdown
fn serialize(blocks: &[Block]) -> String;
}
// Usage
let blocks = BlockParser::parse(&node.content);
let filtered = blocks.iter().filter(|b| b.properties.tags.contains("card"));
```
**Pros**:
- ✅ No schema changes
- ✅ Backward compatible
- ✅ Simple storage (still just String)
**Cons**:
- ❌ Parse overhead on every access
- ❌ Can't query blocks in database (SurrealDB)
- ❌ Harder to index/search blocks
### Option C: Hybrid Approach (RECOMMENDED)
**Combine both: structured storage + lazy parsing**:
```rust
pub struct Node {
// ... existing fields ...
pub content: String, // Source of truth (markdown)
#[serde(skip_serializing_if = "Option::is_none")]
pub blocks: Option<Vec<Block>>, // Cached structure (parsed)
}
impl Node {
// Parse blocks from content if not already cached
pub fn get_blocks(&mut self) -> &Vec<Block> {
if self.blocks.is_none() {
self.blocks = Some(BlockParser::parse(&self.content));
}
self.blocks.as_ref().unwrap()
}
// Update content from blocks (when blocks modified)
pub fn sync_blocks_to_content(&mut self) {
if let Some(ref blocks) = self.blocks {
self.content = BlockParser::serialize(blocks);
}
}
}
```
**Storage Strategy**:
1. **Filesystem** - Store as markdown (Logseq compatible):
```markdown
- Block 1 #card
- Nested block
- Block 2 TODO
```
2. **SurrealDB** - Store both:
```sql
DEFINE TABLE block SCHEMAFULL;
DEFINE FIELD node_id ON block TYPE record(node);
DEFINE FIELD block_id ON block TYPE string;
DEFINE FIELD content ON block TYPE string;
DEFINE FIELD properties ON block TYPE object;
DEFINE FIELD parent_id ON block TYPE option<string>;
-- Index for queries
DEFINE INDEX block_tags ON block COLUMNS properties.tags;
DEFINE INDEX block_status ON block COLUMNS properties.status;
```
**Pros**:
- ✅ Best of both worlds
- ✅ Filesystem stays Logseq-compatible
- ✅ SurrealDB can query blocks
- ✅ Lazy parsing (only when needed)
- ✅ Backward compatible
**Cons**:
- ⚠️ Need to keep content/blocks in sync
- ⚠️ More complex implementation
## Recommended Implementation
**Phase 1: Data Model**
```rust
// crates/kogral-core/src/models/block.rs
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
/// A content block (Logseq-style)
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Block {
/// Unique block identifier (UUID)
pub id: String,
/// Block content (markdown text, excluding nested blocks)
pub content: String,
/// Block properties (tags, status, custom)
pub properties: BlockProperties,
/// Child blocks (nested hierarchy)
#[serde(default)]
pub children: Vec<Block>,
/// Creation timestamp
pub created: DateTime<Utc>,
/// Last modification timestamp
pub modified: DateTime<Utc>,
/// Parent block ID (if nested)
#[serde(skip_serializing_if = "Option::is_none")]
pub parent_id: Option<String>,
}
/// Block-level properties
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
pub struct BlockProperties {
/// Tags (e.g., #card, #important)
#[serde(default)]
pub tags: Vec<String>,
/// Task status (TODO, DONE, etc.)
#[serde(skip_serializing_if = "Option::is_none")]
pub status: Option<TaskStatus>,
/// Custom properties (property:: value)
#[serde(default)]
pub custom: HashMap<String, String>,
/// Block references ((uuid))
#[serde(default)]
pub block_refs: Vec<String>,
/// Page references ([[page]])
#[serde(default)]
pub page_refs: Vec<String>,
}
/// Task status for TODO blocks
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "UPPERCASE")]
pub enum TaskStatus {
Todo,
Doing,
Done,
Later,
Now,
Waiting,
Cancelled,
}
impl Block {
/// Create a new block with content
pub fn new(content: String) -> Self {
use uuid::Uuid;
Self {
id: Uuid::new_v4().to_string(),
content,
properties: BlockProperties::default(),
children: Vec::new(),
created: Utc::now(),
modified: Utc::now(),
parent_id: None,
}
}
/// Add a child block
pub fn add_child(&mut self, mut child: Block) {
child.parent_id = Some(self.id.clone());
self.children.push(child);
self.modified = Utc::now();
}
/// Add a tag to this block
pub fn add_tag(&mut self, tag: String) {
if !self.properties.tags.contains(&tag) {
self.properties.tags.push(tag);
self.modified = Utc::now();
}
}
/// Set task status
pub fn set_status(&mut self, status: TaskStatus) {
self.properties.status = Some(status);
self.modified = Utc::now();
}
/// Get all blocks (self + descendants) as flat list
pub fn flatten(&self) -> Vec<&Block> {
let mut result = vec![self];
for child in &self.children {
result.extend(child.flatten());
}
result
}
/// Find block by ID in tree
pub fn find(&self, id: &str) -> Option<&Block> {
if self.id == id {
return Some(self);
}
for child in &self.children {
if let Some(found) = child.find(id) {
return Some(found);
}
}
None
}
}
```
**Phase 2: Update Node Model**
```rust
// crates/kogral-core/src/models.rs (modifications)
use crate::models::block::Block;
pub struct Node {
// ... existing fields ...
pub content: String,
/// Structured blocks (optional, parsed from content)
#[serde(skip_serializing_if = "Option::is_none")]
pub blocks: Option<Vec<Block>>,
}
impl Node {
/// Get blocks, parsing from content if needed
pub fn get_blocks(&mut self) -> Result<&Vec<Block>> {
if self.blocks.is_none() {
self.blocks = Some(crate::parser::BlockParser::parse(&self.content)?);
}
Ok(self.blocks.as_ref().unwrap())
}
/// Update content from blocks
pub fn sync_blocks_to_content(&mut self) {
if let Some(ref blocks) = self.blocks {
self.content = crate::parser::BlockParser::serialize(blocks);
}
}
/// Find all blocks with a specific tag
pub fn find_blocks_by_tag(&mut self, tag: &str) -> Result<Vec<&Block>> {
let blocks = self.get_blocks()?;
let mut result = Vec::new();
for block in blocks {
for b in block.flatten() {
if b.properties.tags.iter().any(|t| t == tag) {
result.push(b);
}
}
}
Ok(result)
}
/// Find all TODO blocks
pub fn find_todos(&mut self) -> Result<Vec<&Block>> {
let blocks = self.get_blocks()?;
let mut result = Vec::new();
for block in blocks {
for b in block.flatten() {
if matches!(b.properties.status, Some(TaskStatus::Todo)) {
result.push(b);
}
}
}
Ok(result)
}
}
```
**Phase 3: Block Parser**
```rust
// crates/kogral-core/src/parser/block_parser.rs
use crate::models::block::{Block, BlockProperties, TaskStatus};
use regex::Regex;
pub struct BlockParser;
impl BlockParser {
/// Parse markdown content into block structure
///
/// Handles:
/// - Outliner format (- prefix with indentation)
/// - Tags (#card, #important)
/// - Task status (TODO, DONE)
/// - Properties (property:: value)
/// - Block references (((uuid)))
/// - Page references ([[page]])
pub fn parse(content: &str) -> Result<Vec<Block>> {
let mut blocks = Vec::new();
let mut stack: Vec<(usize, Block)> = Vec::new(); // (indent_level, block)
for line in content.lines() {
// Detect indentation level
let indent = count_indent(line);
let trimmed = line.trim_start();
// Skip empty lines
if trimmed.is_empty() {
continue;
}
// Parse block line
if let Some(block_content) = trimmed.strip_prefix("- ") {
let mut block = Self::parse_block_line(block_content)?;
// Pop stack until we find parent level
while let Some((level, _)) = stack.last() {
if *level < indent {
break;
}
stack.pop();
}
// Add as child to parent or as root
if let Some((_, parent)) = stack.last_mut() {
parent.add_child(block.clone());
} else {
blocks.push(block.clone());
}
stack.push((indent, block));
}
}
Ok(blocks)
}
/// Parse a single block line (after "- " prefix)
fn parse_block_line(line: &str) -> Result<Block> {
let mut block = Block::new(String::new());
let mut properties = BlockProperties::default();
// Extract task status (TODO, DONE, etc.)
let (status, remaining) = Self::extract_task_status(line);
properties.status = status;
// Extract tags (#card, #important)
let (tags, remaining) = Self::extract_tags(remaining);
properties.tags = tags;
// Extract properties (property:: value)
let (custom_props, remaining) = Self::extract_properties(remaining);
properties.custom = custom_props;
// Extract block references (((uuid)))
let (block_refs, remaining) = Self::extract_block_refs(remaining);
properties.block_refs = block_refs;
// Extract page references ([[page]])
let (page_refs, content) = Self::extract_page_refs(remaining);
properties.page_refs = page_refs;
block.content = content.trim().to_string();
block.properties = properties;
Ok(block)
}
/// Serialize blocks back to markdown
pub fn serialize(blocks: &[Block]) -> String {
let mut result = String::new();
for block in blocks {
Self::serialize_block(&mut result, block, 0);
}
result
}
fn serialize_block(output: &mut String, block: &Block, indent: usize) {
// Write indent
for _ in 0..indent {
output.push_str(" ");
}
// Write prefix
output.push_str("- ");
// Write task status
if let Some(status) = block.properties.status {
output.push_str(&format!("{:?} ", status).to_uppercase());
}
// Write content
output.push_str(&block.content);
// Write tags
for tag in &block.properties.tags {
output.push_str(&format!(" #{}", tag));
}
// Write properties
if !block.properties.custom.is_empty() {
output.push('\n');
for (key, value) in &block.properties.custom {
for _ in 0..=indent {
output.push_str(" ");
}
output.push_str(&format!("{}:: {}\n", key, value));
}
}
output.push('\n');
// Write children recursively
for child in &block.children {
Self::serialize_block(output, child, indent + 1);
}
}
// Helper methods for extraction
fn extract_task_status(line: &str) -> (Option<TaskStatus>, &str) {
let line = line.trim_start();
if let Some(rest) = line.strip_prefix("TODO ") {
(Some(TaskStatus::Todo), rest)
} else if let Some(rest) = line.strip_prefix("DONE ") {
(Some(TaskStatus::Done), rest)
} else if let Some(rest) = line.strip_prefix("DOING ") {
(Some(TaskStatus::Doing), rest)
} else if let Some(rest) = line.strip_prefix("LATER ") {
(Some(TaskStatus::Later), rest)
} else if let Some(rest) = line.strip_prefix("NOW ") {
(Some(TaskStatus::Now), rest)
} else if let Some(rest) = line.strip_prefix("WAITING ") {
(Some(TaskStatus::Waiting), rest)
} else if let Some(rest) = line.strip_prefix("CANCELLED ") {
(Some(TaskStatus::Cancelled), rest)
} else {
(None, line)
}
}
fn extract_tags(line: &str) -> (Vec<String>, String) {
let tag_regex = Regex::new(r"#(\w+)").unwrap();
let mut tags = Vec::new();
let mut result = line.to_string();
for cap in tag_regex.captures_iter(line) {
if let Some(tag) = cap.get(1) {
tags.push(tag.as_str().to_string());
result = result.replace(&format!("#{}", tag.as_str()), "");
}
}
(tags, result.trim().to_string())
}
fn extract_properties(line: &str) -> (HashMap<String, String>, String) {
let prop_regex = Regex::new(r"(\w+)::\s*([^\n]+)").unwrap();
let mut props = HashMap::new();
let mut result = line.to_string();
for cap in prop_regex.captures_iter(line) {
if let (Some(key), Some(value)) = (cap.get(1), cap.get(2)) {
props.insert(key.as_str().to_string(), value.as_str().trim().to_string());
result = result.replace(&cap[0], "");
}
}
(props, result.trim().to_string())
}
fn extract_block_refs(line: &str) -> (Vec<String>, String) {
let ref_regex = Regex::new(r"\(\(([^)]+)\)\)").unwrap();
let mut refs = Vec::new();
let mut result = line.to_string();
for cap in ref_regex.captures_iter(line) {
if let Some(uuid) = cap.get(1) {
refs.push(uuid.as_str().to_string());
result = result.replace(&cap[0], "");
}
}
(refs, result.trim().to_string())
}
fn extract_page_refs(line: &str) -> (Vec<String>, String) {
let page_regex = Regex::new(r"\[\[([^\]]+)\]\]").unwrap();
let mut pages = Vec::new();
let result = line.to_string();
for cap in page_regex.captures_iter(line) {
if let Some(page) = cap.get(1) {
pages.push(page.as_str().to_string());
// Keep [[page]] in content for now (backward compat)
}
}
(pages, result)
}
}
fn count_indent(line: &str) -> usize {
line.chars().take_while(|c| c.is_whitespace()).count() / 2
}
```
**Phase 4: Logseq Import/Export**
```rust
// crates/kogral-core/src/logseq.rs
use crate::models::{Node, NodeType};
use crate::models::block::Block;
use crate::parser::BlockParser;
pub struct LogseqImporter;
impl LogseqImporter {
/// Import a Logseq page (markdown file) as a Node
pub fn import_page(path: &Path) -> Result<Node> {
let content = std::fs::read_to_string(path)?;
// Extract frontmatter if present
let (frontmatter, body) = Self::split_frontmatter(&content);
// Parse blocks from body
let blocks = BlockParser::parse(&body)?;
// Create node with blocks
let mut node = Node::new(NodeType::Note, Self::extract_title(path));
node.content = body;
node.blocks = Some(blocks);
// Apply frontmatter properties
if let Some(fm) = frontmatter {
Self::apply_frontmatter(&mut node, &fm)?;
}
Ok(node)
}
fn split_frontmatter(content: &str) -> (Option<String>, String) {
if content.starts_with("---\n") {
if let Some(end) = content[4..].find("\n---\n") {
let frontmatter = content[4..4 + end].to_string();
let body = content[4 + end + 5..].to_string();
return (Some(frontmatter), body);
}
}
(None, content.to_string())
}
fn extract_title(path: &Path) -> String {
path.file_stem()
.and_then(|s| s.to_str())
.unwrap_or("Untitled")
.to_string()
}
fn apply_frontmatter(node: &mut Node, frontmatter: &str) -> Result<()> {
// Parse YAML frontmatter and apply to node
// ... implementation ...
Ok(())
}
}
pub struct LogseqExporter;
impl LogseqExporter {
/// Export a Node to Logseq page format
pub fn export_page(node: &Node, path: &Path) -> Result<()> {
let mut output = String::new();
// Generate frontmatter
output.push_str("---\n");
output.push_str(&Self::generate_frontmatter(node)?);
output.push_str("---\n\n");
// Serialize blocks or use content
if let Some(ref blocks) = node.blocks {
output.push_str(&BlockParser::serialize(blocks));
} else {
output.push_str(&node.content);
}
std::fs::write(path, output)?;
Ok(())
}
fn generate_frontmatter(node: &Node) -> Result<String> {
let mut fm = String::new();
fm.push_str(&format!("title: {}\n", node.title));
fm.push_str(&format!("tags: {}\n", node.tags.join(", ")));
// ... more frontmatter fields ...
Ok(fm)
}
}
```
## Query API Extensions
```rust
// New methods in Graph or Query module
impl Graph {
/// Find all blocks with a specific tag across all nodes
pub fn find_blocks_by_tag(&mut self, tag: &str) -> Vec<(&Node, &Block)> {
let mut results = Vec::new();
for node in self.nodes.values_mut() {
if let Ok(blocks) = node.find_blocks_by_tag(tag) {
for block in blocks {
results.push((node as &Node, block));
}
}
}
results
}
/// Find all flashcards (#card blocks)
pub fn find_flashcards(&mut self) -> Vec<(&Node, &Block)> {
self.find_blocks_by_tag("card")
}
/// Find all TODO items across knowledge base
pub fn find_all_todos(&mut self) -> Vec<(&Node, &Block)> {
let mut results = Vec::new();
for node in self.nodes.values_mut() {
if let Ok(todos) = node.find_todos() {
for block in todos {
results.push((node as &Node, block));
}
}
}
results
}
}
```
## MCP Tool Extensions
```json
{
"name": "kogral/find_blocks",
"description": "Find blocks by tag, status, or properties",
"inputSchema": {
"type": "object",
"properties": {
"tag": { "type": "string", "description": "Filter by tag (e.g., 'card')" },
"status": { "type": "string", "enum": ["TODO", "DONE", "DOING"] },
"property": { "type": "string", "description": "Custom property key" },
"value": { "type": "string", "description": "Property value to match" }
}
}
}
```
## Configuration
```nickel
# schemas/contracts.ncl (additions)
BlockConfig = {
enabled | Bool
| doc "Enable block-level parsing and storage"
| default = true,
preserve_hierarchy | Bool
| doc "Preserve block nesting on import/export"
| default = true,
parse_on_load | Bool
| doc "Automatically parse blocks when loading nodes"
| default = false, # Lazy parsing by default
supported_statuses | Array String
| doc "Supported task statuses"
| default = ["TODO", "DONE", "DOING", "LATER", "NOW", "WAITING", "CANCELLED"],
}
KbConfig = {
# ... existing fields ...
blocks | BlockConfig
| doc "Block-level features configuration"
| default = {},
}
```
## Migration Path
**Phase 1**: Add Block models (no behavior change)
**Phase 2**: Add BlockParser (opt-in via config)
**Phase 3**: Update Logseq import/export
**Phase 4**: Add block queries to CLI/MCP
**Phase 5**: SurrealDB block indexing
**Backward Compatibility**:
- Existing nodes without `blocks` field work as before
- `content` remains source of truth
- `blocks` is optional cache/structure
- Config flag `blocks.enabled` to opt-in
## Testing Strategy
```rust
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_simple_block() {
let content = "- This is a block #card";
let blocks = BlockParser::parse(content).unwrap();
assert_eq!(blocks.len(), 1);
assert_eq!(blocks[0].content, "This is a block");
assert_eq!(blocks[0].properties.tags, vec!["card"]);
}
#[test]
fn test_parse_nested_blocks() {
let content = r#"
- Parent block
- Child block 1
- Child block 2
- Grandchild
"#;
let blocks = BlockParser::parse(content).unwrap();
assert_eq!(blocks.len(), 1);
assert_eq!(blocks[0].children.len(), 2);
assert_eq!(blocks[0].children[1].children.len(), 1);
}
#[test]
fn test_parse_todo() {
let content = "- TODO Implement feature #rust";
let blocks = BlockParser::parse(content).unwrap();
assert_eq!(blocks[0].properties.status, Some(TaskStatus::Todo));
assert_eq!(blocks[0].content, "Implement feature");
assert_eq!(blocks[0].properties.tags, vec!["rust"]);
}
#[test]
fn test_roundtrip() {
let original = r#"- Block 1 #card
- Nested
- TODO Block 2
priority:: high
"#;
let blocks = BlockParser::parse(original).unwrap();
let serialized = BlockParser::serialize(&blocks);
let reparsed = BlockParser::parse(&serialized).unwrap();
assert_eq!(blocks.len(), reparsed.len());
assert_eq!(blocks[0].properties, reparsed[0].properties);
}
}
```
## Summary
**Recommended Approach**: Hybrid (Option C)
- **Add** `Block` struct with properties, hierarchy
- **Extend** `Node` with optional `blocks: Option<Vec<Block>>`
- **Implement** bidirectional parser (markdown ↔ blocks)
- **Preserve** `content` as source of truth (backward compat)
- **Enable** block queries in CLI/MCP
- **Support** round-trip Logseq import/export
**Benefits**:
- ✅ Full Logseq compatibility
- ✅ Queryable blocks (find #card, TODO, etc.)
- ✅ Backward compatible
- ✅ Extensible (custom properties)
- ✅ Type-safe structure
**Trade-offs**:
- ⚠️ Added complexity
- ⚠️ Need to sync content ↔ blocks
- ⚠️ More storage for SurrealDB backend
**Next Steps**:
1. Review and approve design
2. Implement Phase 1 (Block models)
3. Implement Phase 2 (BlockParser)
4. Update Logseq import/export
5. Add block queries to MCP/CLI