# Logseq Blocks Support - Architecture Design ## Problem Statement Logseq uses **content blocks** as the fundamental unit of information, not full documents. Each block can have: - **Properties**: `#card`, `TODO`, `DONE`, custom properties - **Tags**: Inline tags like `#flashcard`, `#important` - **References**: Block references `((block-id))`, page references `[[page]]` - **Nesting**: Outliner-style hierarchy (parent-child blocks) - **Metadata**: Block-level properties (unlike page-level frontmatter) **Current KB limitation**: Nodes only have `content: String` (flat markdown). Importing from Logseq loses block structure and properties. **Requirement**: Support round-trip import/export with full block fidelity: ```text Logseq Graph → KOGRAL Import → KOGRAL Storage → KOGRAL Export → Logseq Graph (blocks preserved) (blocks preserved) ``` ## Use Cases ### 1. Flashcards (`#card`) **Logseq**: ```markdown - What is Rust's ownership model? #card - Rust uses ownership, borrowing, and lifetimes - Three rules: one owner, many borrows XOR one mutable ``` **KB needs to preserve**: - Block with `#card` property - Nested answer blocks - Ability to query all cards ### 2. Task Tracking (`TODO`/`DONE`) **Logseq**: ```markdown - TODO Implement block parser #rust - DONE Research block structure - TODO Write parser tests ``` **KB needs to preserve**: - Task status per block - Hierarchical task breakdown - Tags on tasks ### 3. Block References **Logseq**: ```markdown - Core concept: ((block-uuid-123)) - See also: [[Related Page]] ``` **KB needs to preserve**: - Block-to-block links (not just page-to-page) - UUID references ### 4. Block Properties **Logseq**: ```markdown - This is a block with properties property1:: value1 property2:: value2 ``` **KB needs to preserve**: - Custom key-value properties per block - Property inheritance/override ## Design Options ### Option A: Blocks as First-Class Data Structure **Add `blocks` field to Node**: ```rust pub struct Node { // ... existing fields ... pub content: String, // Backward compat: flat markdown pub blocks: Option>, // NEW: Structured blocks } pub struct Block { pub id: String, // UUID or auto-generated pub content: String, // Block text pub properties: BlockProperties, // Tags, status, custom props pub children: Vec, // Nested blocks pub created: DateTime, pub modified: DateTime, } pub struct BlockProperties { pub tags: Vec, // #card, #important pub status: Option, // TODO, DONE, WAITING pub custom: HashMap, // property:: value } pub enum TaskStatus { Todo, Doing, Done, Waiting, Cancelled, } ``` **Pros**: - ✅ Type-safe, explicit structure - ✅ Queryable (find all #card blocks) - ✅ Preserves hierarchy - ✅ Supports block-level operations **Cons**: - ❌ Adds complexity to Node - ❌ Dual representation (content + blocks) - ❌ Requires migration of existing data ### Option B: Parser-Only Approach **Keep `content: String`, parse blocks on-demand**: ```rust pub struct BlockParser; impl BlockParser { // Parse markdown content into block structure fn parse(content: &str) -> Vec; // Serialize blocks back to markdown fn serialize(blocks: &[Block]) -> String; } // Usage let blocks = BlockParser::parse(&node.content); let filtered = blocks.iter().filter(|b| b.properties.tags.contains("card")); ``` **Pros**: - ✅ No schema changes - ✅ Backward compatible - ✅ Simple storage (still just String) **Cons**: - ❌ Parse overhead on every access - ❌ Can't query blocks in database (SurrealDB) - ❌ Harder to index/search blocks ### Option C: Hybrid Approach (RECOMMENDED) **Combine both: structured storage + lazy parsing**: ```rust pub struct Node { // ... existing fields ... pub content: String, // Source of truth (markdown) #[serde(skip_serializing_if = "Option::is_none")] pub blocks: Option>, // Cached structure (parsed) } impl Node { // Parse blocks from content if not already cached pub fn get_blocks(&mut self) -> &Vec { if self.blocks.is_none() { self.blocks = Some(BlockParser::parse(&self.content)); } self.blocks.as_ref().unwrap() } // Update content from blocks (when blocks modified) pub fn sync_blocks_to_content(&mut self) { if let Some(ref blocks) = self.blocks { self.content = BlockParser::serialize(blocks); } } } ``` **Storage Strategy**: 1. **Filesystem** - Store as markdown (Logseq compatible): ```markdown - Block 1 #card - Nested block - Block 2 TODO ``` 2. **SurrealDB** - Store both: ```sql DEFINE TABLE block SCHEMAFULL; DEFINE FIELD node_id ON block TYPE record(node); DEFINE FIELD block_id ON block TYPE string; DEFINE FIELD content ON block TYPE string; DEFINE FIELD properties ON block TYPE object; DEFINE FIELD parent_id ON block TYPE option; -- Index for queries DEFINE INDEX block_tags ON block COLUMNS properties.tags; DEFINE INDEX block_status ON block COLUMNS properties.status; ``` **Pros**: - ✅ Best of both worlds - ✅ Filesystem stays Logseq-compatible - ✅ SurrealDB can query blocks - ✅ Lazy parsing (only when needed) - ✅ Backward compatible **Cons**: - ⚠️ Need to keep content/blocks in sync - ⚠️ More complex implementation ## Recommended Implementation **Phase 1: Data Model** ```rust // crates/kogral-core/src/models/block.rs use chrono::{DateTime, Utc}; use serde::{Deserialize, Serialize}; use std::collections::HashMap; /// A content block (Logseq-style) #[derive(Debug, Clone, Serialize, Deserialize)] pub struct Block { /// Unique block identifier (UUID) pub id: String, /// Block content (markdown text, excluding nested blocks) pub content: String, /// Block properties (tags, status, custom) pub properties: BlockProperties, /// Child blocks (nested hierarchy) #[serde(default)] pub children: Vec, /// Creation timestamp pub created: DateTime, /// Last modification timestamp pub modified: DateTime, /// Parent block ID (if nested) #[serde(skip_serializing_if = "Option::is_none")] pub parent_id: Option, } /// Block-level properties #[derive(Debug, Clone, Default, Serialize, Deserialize)] pub struct BlockProperties { /// Tags (e.g., #card, #important) #[serde(default)] pub tags: Vec, /// Task status (TODO, DONE, etc.) #[serde(skip_serializing_if = "Option::is_none")] pub status: Option, /// Custom properties (property:: value) #[serde(default)] pub custom: HashMap, /// Block references ((uuid)) #[serde(default)] pub block_refs: Vec, /// Page references ([[page]]) #[serde(default)] pub page_refs: Vec, } /// Task status for TODO blocks #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] #[serde(rename_all = "UPPERCASE")] pub enum TaskStatus { Todo, Doing, Done, Later, Now, Waiting, Cancelled, } impl Block { /// Create a new block with content pub fn new(content: String) -> Self { use uuid::Uuid; Self { id: Uuid::new_v4().to_string(), content, properties: BlockProperties::default(), children: Vec::new(), created: Utc::now(), modified: Utc::now(), parent_id: None, } } /// Add a child block pub fn add_child(&mut self, mut child: Block) { child.parent_id = Some(self.id.clone()); self.children.push(child); self.modified = Utc::now(); } /// Add a tag to this block pub fn add_tag(&mut self, tag: String) { if !self.properties.tags.contains(&tag) { self.properties.tags.push(tag); self.modified = Utc::now(); } } /// Set task status pub fn set_status(&mut self, status: TaskStatus) { self.properties.status = Some(status); self.modified = Utc::now(); } /// Get all blocks (self + descendants) as flat list pub fn flatten(&self) -> Vec<&Block> { let mut result = vec![self]; for child in &self.children { result.extend(child.flatten()); } result } /// Find block by ID in tree pub fn find(&self, id: &str) -> Option<&Block> { if self.id == id { return Some(self); } for child in &self.children { if let Some(found) = child.find(id) { return Some(found); } } None } } ``` **Phase 2: Update Node Model** ```rust // crates/kogral-core/src/models.rs (modifications) use crate::models::block::Block; pub struct Node { // ... existing fields ... pub content: String, /// Structured blocks (optional, parsed from content) #[serde(skip_serializing_if = "Option::is_none")] pub blocks: Option>, } impl Node { /// Get blocks, parsing from content if needed pub fn get_blocks(&mut self) -> Result<&Vec> { if self.blocks.is_none() { self.blocks = Some(crate::parser::BlockParser::parse(&self.content)?); } Ok(self.blocks.as_ref().unwrap()) } /// Update content from blocks pub fn sync_blocks_to_content(&mut self) { if let Some(ref blocks) = self.blocks { self.content = crate::parser::BlockParser::serialize(blocks); } } /// Find all blocks with a specific tag pub fn find_blocks_by_tag(&mut self, tag: &str) -> Result> { let blocks = self.get_blocks()?; let mut result = Vec::new(); for block in blocks { for b in block.flatten() { if b.properties.tags.iter().any(|t| t == tag) { result.push(b); } } } Ok(result) } /// Find all TODO blocks pub fn find_todos(&mut self) -> Result> { let blocks = self.get_blocks()?; let mut result = Vec::new(); for block in blocks { for b in block.flatten() { if matches!(b.properties.status, Some(TaskStatus::Todo)) { result.push(b); } } } Ok(result) } } ``` **Phase 3: Block Parser** ```rust // crates/kogral-core/src/parser/block_parser.rs use crate::models::block::{Block, BlockProperties, TaskStatus}; use regex::Regex; pub struct BlockParser; impl BlockParser { /// Parse markdown content into block structure /// /// Handles: /// - Outliner format (- prefix with indentation) /// - Tags (#card, #important) /// - Task status (TODO, DONE) /// - Properties (property:: value) /// - Block references (((uuid))) /// - Page references ([[page]]) pub fn parse(content: &str) -> Result> { let mut blocks = Vec::new(); let mut stack: Vec<(usize, Block)> = Vec::new(); // (indent_level, block) for line in content.lines() { // Detect indentation level let indent = count_indent(line); let trimmed = line.trim_start(); // Skip empty lines if trimmed.is_empty() { continue; } // Parse block line if let Some(block_content) = trimmed.strip_prefix("- ") { let mut block = Self::parse_block_line(block_content)?; // Pop stack until we find parent level while let Some((level, _)) = stack.last() { if *level < indent { break; } stack.pop(); } // Add as child to parent or as root if let Some((_, parent)) = stack.last_mut() { parent.add_child(block.clone()); } else { blocks.push(block.clone()); } stack.push((indent, block)); } } Ok(blocks) } /// Parse a single block line (after "- " prefix) fn parse_block_line(line: &str) -> Result { let mut block = Block::new(String::new()); let mut properties = BlockProperties::default(); // Extract task status (TODO, DONE, etc.) let (status, remaining) = Self::extract_task_status(line); properties.status = status; // Extract tags (#card, #important) let (tags, remaining) = Self::extract_tags(remaining); properties.tags = tags; // Extract properties (property:: value) let (custom_props, remaining) = Self::extract_properties(remaining); properties.custom = custom_props; // Extract block references (((uuid))) let (block_refs, remaining) = Self::extract_block_refs(remaining); properties.block_refs = block_refs; // Extract page references ([[page]]) let (page_refs, content) = Self::extract_page_refs(remaining); properties.page_refs = page_refs; block.content = content.trim().to_string(); block.properties = properties; Ok(block) } /// Serialize blocks back to markdown pub fn serialize(blocks: &[Block]) -> String { let mut result = String::new(); for block in blocks { Self::serialize_block(&mut result, block, 0); } result } fn serialize_block(output: &mut String, block: &Block, indent: usize) { // Write indent for _ in 0..indent { output.push_str(" "); } // Write prefix output.push_str("- "); // Write task status if let Some(status) = block.properties.status { output.push_str(&format!("{:?} ", status).to_uppercase()); } // Write content output.push_str(&block.content); // Write tags for tag in &block.properties.tags { output.push_str(&format!(" #{}", tag)); } // Write properties if !block.properties.custom.is_empty() { output.push('\n'); for (key, value) in &block.properties.custom { for _ in 0..=indent { output.push_str(" "); } output.push_str(&format!("{}:: {}\n", key, value)); } } output.push('\n'); // Write children recursively for child in &block.children { Self::serialize_block(output, child, indent + 1); } } // Helper methods for extraction fn extract_task_status(line: &str) -> (Option, &str) { let line = line.trim_start(); if let Some(rest) = line.strip_prefix("TODO ") { (Some(TaskStatus::Todo), rest) } else if let Some(rest) = line.strip_prefix("DONE ") { (Some(TaskStatus::Done), rest) } else if let Some(rest) = line.strip_prefix("DOING ") { (Some(TaskStatus::Doing), rest) } else if let Some(rest) = line.strip_prefix("LATER ") { (Some(TaskStatus::Later), rest) } else if let Some(rest) = line.strip_prefix("NOW ") { (Some(TaskStatus::Now), rest) } else if let Some(rest) = line.strip_prefix("WAITING ") { (Some(TaskStatus::Waiting), rest) } else if let Some(rest) = line.strip_prefix("CANCELLED ") { (Some(TaskStatus::Cancelled), rest) } else { (None, line) } } fn extract_tags(line: &str) -> (Vec, String) { let tag_regex = Regex::new(r"#(\w+)").unwrap(); let mut tags = Vec::new(); let mut result = line.to_string(); for cap in tag_regex.captures_iter(line) { if let Some(tag) = cap.get(1) { tags.push(tag.as_str().to_string()); result = result.replace(&format!("#{}", tag.as_str()), ""); } } (tags, result.trim().to_string()) } fn extract_properties(line: &str) -> (HashMap, String) { let prop_regex = Regex::new(r"(\w+)::\s*([^\n]+)").unwrap(); let mut props = HashMap::new(); let mut result = line.to_string(); for cap in prop_regex.captures_iter(line) { if let (Some(key), Some(value)) = (cap.get(1), cap.get(2)) { props.insert(key.as_str().to_string(), value.as_str().trim().to_string()); result = result.replace(&cap[0], ""); } } (props, result.trim().to_string()) } fn extract_block_refs(line: &str) -> (Vec, String) { let ref_regex = Regex::new(r"\(\(([^)]+)\)\)").unwrap(); let mut refs = Vec::new(); let mut result = line.to_string(); for cap in ref_regex.captures_iter(line) { if let Some(uuid) = cap.get(1) { refs.push(uuid.as_str().to_string()); result = result.replace(&cap[0], ""); } } (refs, result.trim().to_string()) } fn extract_page_refs(line: &str) -> (Vec, String) { let page_regex = Regex::new(r"\[\[([^\]]+)\]\]").unwrap(); let mut pages = Vec::new(); let result = line.to_string(); for cap in page_regex.captures_iter(line) { if let Some(page) = cap.get(1) { pages.push(page.as_str().to_string()); // Keep [[page]] in content for now (backward compat) } } (pages, result) } } fn count_indent(line: &str) -> usize { line.chars().take_while(|c| c.is_whitespace()).count() / 2 } ``` **Phase 4: Logseq Import/Export** ```rust // crates/kogral-core/src/logseq.rs use crate::models::{Node, NodeType}; use crate::models::block::Block; use crate::parser::BlockParser; pub struct LogseqImporter; impl LogseqImporter { /// Import a Logseq page (markdown file) as a Node pub fn import_page(path: &Path) -> Result { let content = std::fs::read_to_string(path)?; // Extract frontmatter if present let (frontmatter, body) = Self::split_frontmatter(&content); // Parse blocks from body let blocks = BlockParser::parse(&body)?; // Create node with blocks let mut node = Node::new(NodeType::Note, Self::extract_title(path)); node.content = body; node.blocks = Some(blocks); // Apply frontmatter properties if let Some(fm) = frontmatter { Self::apply_frontmatter(&mut node, &fm)?; } Ok(node) } fn split_frontmatter(content: &str) -> (Option, String) { if content.starts_with("---\n") { if let Some(end) = content[4..].find("\n---\n") { let frontmatter = content[4..4 + end].to_string(); let body = content[4 + end + 5..].to_string(); return (Some(frontmatter), body); } } (None, content.to_string()) } fn extract_title(path: &Path) -> String { path.file_stem() .and_then(|s| s.to_str()) .unwrap_or("Untitled") .to_string() } fn apply_frontmatter(node: &mut Node, frontmatter: &str) -> Result<()> { // Parse YAML frontmatter and apply to node // ... implementation ... Ok(()) } } pub struct LogseqExporter; impl LogseqExporter { /// Export a Node to Logseq page format pub fn export_page(node: &Node, path: &Path) -> Result<()> { let mut output = String::new(); // Generate frontmatter output.push_str("---\n"); output.push_str(&Self::generate_frontmatter(node)?); output.push_str("---\n\n"); // Serialize blocks or use content if let Some(ref blocks) = node.blocks { output.push_str(&BlockParser::serialize(blocks)); } else { output.push_str(&node.content); } std::fs::write(path, output)?; Ok(()) } fn generate_frontmatter(node: &Node) -> Result { let mut fm = String::new(); fm.push_str(&format!("title: {}\n", node.title)); fm.push_str(&format!("tags: {}\n", node.tags.join(", "))); // ... more frontmatter fields ... Ok(fm) } } ``` ## Query API Extensions ```rust // New methods in Graph or Query module impl Graph { /// Find all blocks with a specific tag across all nodes pub fn find_blocks_by_tag(&mut self, tag: &str) -> Vec<(&Node, &Block)> { let mut results = Vec::new(); for node in self.nodes.values_mut() { if let Ok(blocks) = node.find_blocks_by_tag(tag) { for block in blocks { results.push((node as &Node, block)); } } } results } /// Find all flashcards (#card blocks) pub fn find_flashcards(&mut self) -> Vec<(&Node, &Block)> { self.find_blocks_by_tag("card") } /// Find all TODO items across knowledge base pub fn find_all_todos(&mut self) -> Vec<(&Node, &Block)> { let mut results = Vec::new(); for node in self.nodes.values_mut() { if let Ok(todos) = node.find_todos() { for block in todos { results.push((node as &Node, block)); } } } results } } ``` ## MCP Tool Extensions ```json { "name": "kogral/find_blocks", "description": "Find blocks by tag, status, or properties", "inputSchema": { "type": "object", "properties": { "tag": { "type": "string", "description": "Filter by tag (e.g., 'card')" }, "status": { "type": "string", "enum": ["TODO", "DONE", "DOING"] }, "property": { "type": "string", "description": "Custom property key" }, "value": { "type": "string", "description": "Property value to match" } } } } ``` ## Configuration ```nickel # schemas/contracts.ncl (additions) BlockConfig = { enabled | Bool | doc "Enable block-level parsing and storage" | default = true, preserve_hierarchy | Bool | doc "Preserve block nesting on import/export" | default = true, parse_on_load | Bool | doc "Automatically parse blocks when loading nodes" | default = false, # Lazy parsing by default supported_statuses | Array String | doc "Supported task statuses" | default = ["TODO", "DONE", "DOING", "LATER", "NOW", "WAITING", "CANCELLED"], } KbConfig = { # ... existing fields ... blocks | BlockConfig | doc "Block-level features configuration" | default = {}, } ``` ## Migration Path **Phase 1**: Add Block models (no behavior change) **Phase 2**: Add BlockParser (opt-in via config) **Phase 3**: Update Logseq import/export **Phase 4**: Add block queries to CLI/MCP **Phase 5**: SurrealDB block indexing **Backward Compatibility**: - Existing nodes without `blocks` field work as before - `content` remains source of truth - `blocks` is optional cache/structure - Config flag `blocks.enabled` to opt-in ## Testing Strategy ```rust #[cfg(test)] mod tests { use super::*; #[test] fn test_parse_simple_block() { let content = "- This is a block #card"; let blocks = BlockParser::parse(content).unwrap(); assert_eq!(blocks.len(), 1); assert_eq!(blocks[0].content, "This is a block"); assert_eq!(blocks[0].properties.tags, vec!["card"]); } #[test] fn test_parse_nested_blocks() { let content = r#" - Parent block - Child block 1 - Child block 2 - Grandchild "#; let blocks = BlockParser::parse(content).unwrap(); assert_eq!(blocks.len(), 1); assert_eq!(blocks[0].children.len(), 2); assert_eq!(blocks[0].children[1].children.len(), 1); } #[test] fn test_parse_todo() { let content = "- TODO Implement feature #rust"; let blocks = BlockParser::parse(content).unwrap(); assert_eq!(blocks[0].properties.status, Some(TaskStatus::Todo)); assert_eq!(blocks[0].content, "Implement feature"); assert_eq!(blocks[0].properties.tags, vec!["rust"]); } #[test] fn test_roundtrip() { let original = r#"- Block 1 #card - Nested - TODO Block 2 priority:: high "#; let blocks = BlockParser::parse(original).unwrap(); let serialized = BlockParser::serialize(&blocks); let reparsed = BlockParser::parse(&serialized).unwrap(); assert_eq!(blocks.len(), reparsed.len()); assert_eq!(blocks[0].properties, reparsed[0].properties); } } ``` ## Summary **Recommended Approach**: Hybrid (Option C) - **Add** `Block` struct with properties, hierarchy - **Extend** `Node` with optional `blocks: Option>` - **Implement** bidirectional parser (markdown ↔ blocks) - **Preserve** `content` as source of truth (backward compat) - **Enable** block queries in CLI/MCP - **Support** round-trip Logseq import/export **Benefits**: - ✅ Full Logseq compatibility - ✅ Queryable blocks (find #card, TODO, etc.) - ✅ Backward compatible - ✅ Extensible (custom properties) - ✅ Type-safe structure **Trade-offs**: - ⚠️ Added complexity - ⚠️ Need to sync content ↔ blocks - ⚠️ More storage for SurrealDB backend **Next Steps**: 1. Review and approve design 2. Implement Phase 1 (Block models) 3. Implement Phase 2 (BlockParser) 4. Update Logseq import/export 5. Add block queries to MCP/CLI