# ADR-002: FastEmbed via AI Providers for Embeddings **Status**: Accepted **Date**: 2026-01-17 **Deciders**: Architecture Team **Context**: Embedding Strategy for Semantic Search --- ## Context The KOGRAL requires embedding generation for semantic search capabilities. Embeddings convert text into numerical vectors that capture semantic meaning, enabling "find concepts" rather than just "find keywords". **Requirements**: 1. **Local-First Option**: Must work offline without external API dependencies 2. **Production Scalability**: Support cloud AI providers for large-scale deployments 3. **Multiple Providers**: Flexibility to choose based on cost, quality, privacy 4. **Cost-Effective Development**: Free local embeddings for development and testing 5. **Quality**: Good enough embeddings for finding related concepts **Options Evaluated**: ### Option 1: Only Local Embeddings (fastembed) **Pros**: - No API costs - Works offline - Privacy-preserving (no data leaves machine) - Fast (local GPU acceleration possible) **Cons**: - Limited model quality compared to cloud providers - Resource-intensive (requires download ~100MB models) - Single provider lock-in (fastembed library) **Example**: ```rust use fastembed::{TextEmbedding, InitOptions}; let model = TextEmbedding::try_new(InitOptions { model_name: "BAAI/bge-small-en-v1.5", ..Default::default() })?; let embeddings = model.embed(vec!["Hello world"], None)?; // Output: Vec> with 384 dimensions ``` ### Option 2: Only Cloud AI Providers (OpenAI, Claude, etc.) **Pros**: - State-of-the-art embedding quality - No local resource usage - Latest models available - Scalable to millions of documents **Cons**: - Requires API keys (cost per embedding) - Network dependency (no offline mode) - Privacy concerns (data sent to third parties) - Vendor lock-in risk **Example**: ```rust use rig::providers::openai; let client = openai::Client::new("sk-..."); let embeddings = client.embeddings("text-embedding-3-small") .embed_documents(vec!["Hello world"]).await?; // Output: Vec> with 1536 dimensions ``` ### Option 3: Hybrid Strategy (fastembed + AI providers via rig-core) **Pros**: - ✅ Best of both worlds: local dev, cloud production - ✅ User choice: privacy-first or quality-first - ✅ Cost flexibility: free for small projects, paid for scale - ✅ Unified interface via `rig-core` library - ✅ Easy provider switching (config-driven) **Cons**: - ❌ More complex implementation (multiple providers) - ❌ Dimension mismatch between providers (384 vs 1536) - ❌ Additional dependencies (`rig-core`, `fastembed`) --- ## Decision **We will use a hybrid strategy: fastembed (local) + AI providers (via rig-core).** **Implementation**: 1. **Default**: `fastembed` with `BAAI/bge-small-en-v1.5` (384 dimensions) 2. **Optional**: OpenAI, Claude, Ollama via `rig-core` (configurable) 3. **Interface**: `EmbeddingProvider` trait abstracts provider details 4. **Config-Driven**: Provider selection via Nickel configuration **Architecture**: ```rust #[async_trait] pub trait EmbeddingProvider: Send + Sync { async fn embed(&self, texts: Vec) -> Result>>; fn dimensions(&self) -> usize; fn model_name(&self) -> &str; } // Local implementation pub struct FastEmbedProvider { model: TextEmbedding, } impl FastEmbedProvider { pub fn new(model_name: &str) -> Result { let model = TextEmbedding::try_new(InitOptions { model_name: model_name.into(), ..Default::default() })?; Ok(Self { model }) } } #[async_trait] impl EmbeddingProvider for FastEmbedProvider { async fn embed(&self, texts: Vec) -> Result>> { Ok(self.model.embed(texts, None)?) } fn dimensions(&self) -> usize { 384 } fn model_name(&self) -> &str { "BAAI/bge-small-en-v1.5" } } // Cloud provider implementation (via rig-core) pub struct RigEmbeddingProvider { client: rig::Client, model: String, dimensions: usize, } #[async_trait] impl EmbeddingProvider for RigEmbeddingProvider { async fn embed(&self, texts: Vec) -> Result>> { let embeddings = self.client .embeddings(&self.model) .embed_documents(texts) .await?; Ok(embeddings) } fn dimensions(&self) -> usize { self.dimensions } fn model_name(&self) -> &str { &self.model } } ``` **Configuration** (Nickel): ```nickel # Local development (default) { embeddings = { enabled = true, provider = 'fastembed, model = "BAAI/bge-small-en-v1.5", dimensions = 384, }, } # Production with OpenAI { embeddings = { enabled = true, provider = 'openai, model = "text-embedding-3-small", dimensions = 1536, api_key_env = "OPENAI_API_KEY", }, } # Self-hosted with Ollama { embeddings = { enabled = true, provider = 'ollama, model = "nomic-embed-text", dimensions = 768, }, } ``` **Provider Selection** (`kogral-core/src/embeddings/mod.rs`): ```rust pub fn create_provider(config: &EmbeddingConfig) -> Result> { match config.provider { EmbeddingProviderType::FastEmbed => { Ok(Box::new(FastEmbedProvider::new(&config.model)?)) } EmbeddingProviderType::OpenAI => { let api_key = std::env::var(&config.api_key_env)?; Ok(Box::new(RigEmbeddingProvider::new_openai(api_key, &config.model)?)) } EmbeddingProviderType::Claude => { let api_key = std::env::var(&config.api_key_env)?; Ok(Box::new(RigEmbeddingProvider::new_claude(api_key, &config.model)?)) } EmbeddingProviderType::Ollama => { Ok(Box::new(RigEmbeddingProvider::new_ollama(&config.model)?)) } } } ``` --- ## Consequences ### Positive ✅ **Development Flexibility**: - Developers can use `fastembed` without API keys - Fast feedback loop (local embeddings, no network calls) - Works offline (train trips, flights) ✅ **Production Quality**: - Production deployments can use OpenAI/Claude for better quality - Latest embedding models available - Scalable to millions of documents ✅ **Privacy Control**: - Privacy-sensitive projects use local embeddings - Public projects can use cloud providers - User choice via configuration ✅ **Cost Optimization**: - Small projects: free (fastembed) - Large projects: pay for quality (cloud providers) - Hybrid: important docs via cloud, bulk via local ✅ **Unified Interface**: - `EmbeddingProvider` trait abstracts provider details - Query code doesn't know/care about provider - Easy to add new providers ### Negative ❌ **Dimension Mismatch**: - fastembed: 384 dimensions - OpenAI: 1536 dimensions - Cannot mix in same index **Mitigation**: - Store provider + dimensions in node metadata - Rebuild index when changing providers - Document dimension constraints ❌ **Model Download**: - First use of fastembed downloads ~100MB model - Slow initial startup **Mitigation**: - Pre-download in Docker images - Document model download in setup guide - Cache models in `~/.cache/fastembed` ❌ **Complex Configuration**: - Multiple provider options may confuse users **Mitigation**: - Sane default (fastembed) - Clear examples for each provider - Validation errors explain misconfigurations ### Neutral ⚪ **Dependency Trade-off**: - `fastembed` adds ~5MB to binary - `rig-core` adds ~2MB - Total: ~7MB overhead Not a concern for CLI/MCP server use case. --- ## Provider Comparison | Provider | Dimensions | Quality | Cost | Privacy | Offline | | -------------- | ---------- | --------- | ------------- | --------- | ------- | | **fastembed** | 384 | Good | Free | ✅ Local | ✅ Yes | | **OpenAI** | 1536 | Excellent | $0.0001/1K | ❌ Cloud | ❌ No | | **Claude** | 1024 | Excellent | $0.00025/1K | ❌ Cloud | ❌ No | | **Ollama** | 768 | Very Good | Free | ✅ Local | ✅ Yes | **Recommendation by Use Case**: - **Development**: fastembed (fast, free, offline) - **Small Teams**: fastembed or Ollama (privacy, no costs) - **Enterprise**: OpenAI or Claude (best quality, scalable) - **Self-Hosted**: Ollama (good quality, local control) --- ## Implementation Timeline 1. ✅ Define `EmbeddingProvider` trait 2. ✅ Implement FastEmbedProvider (stub, feature-gated) 3. ✅ Implement RigEmbeddingProvider (stub, feature-gated) 4. ⏳ Complete FastEmbed integration with model download 5. ⏳ Complete rig-core integration (OpenAI, Claude, Ollama) 6. ⏳ Add query engine with similarity search 7. ⏳ Document provider selection and trade-offs --- ## Monitoring **Success Criteria**: - Users can switch providers via config change - Local embeddings work without API keys - Production deployments use cloud providers successfully - Query quality acceptable for both local and cloud embeddings **Metrics**: - Embedding generation latency (local vs cloud) - Query accuracy (precision@10 for semantic search) - API costs (cloud providers) - User satisfaction (feedback on search quality) --- ## References - [fastembed Documentation](https://github.com/Anush008/fastembed-rs) - [rig-core Documentation](https://github.com/0xPlaygrounds/rig) - [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings) - [BAAI/bge Models](https://huggingface.co/BAAI/bge-small-en-v1.5) - [Ollama Embeddings](https://ollama.com/blog/embedding-models) --- ## Revision History | Date | Author | Change | | ---------- | ------------------ | ---------------- | | 2026-01-17 | Architecture Team | Initial decision | --- **Previous ADR**: [ADR-001: Nickel vs TOML](001-nickel-vs-toml.md) **Next ADR**: [ADR-003: Hybrid Storage Strategy](003-hybrid-storage.md)