ADR-002: FastEmbed via AI Providers for Embeddings
Status: Accepted
Date: 2026-01-17
Deciders: Architecture Team
Context: Embedding Strategy for Semantic Search
Context
The KOGRAL requires embedding generation for semantic search capabilities. Embeddings convert text into numerical vectors that capture semantic meaning, enabling "find concepts" rather than just "find keywords".
Requirements:
- Local-First Option: Must work offline without external API dependencies
- Production Scalability: Support cloud AI providers for large-scale deployments
- Multiple Providers: Flexibility to choose based on cost, quality, privacy
- Cost-Effective Development: Free local embeddings for development and testing
- Quality: Good enough embeddings for finding related concepts
Options Evaluated:
Option 1: Only Local Embeddings (fastembed)
Pros:
- No API costs
- Works offline
- Privacy-preserving (no data leaves machine)
- Fast (local GPU acceleration possible)
Cons:
- Limited model quality compared to cloud providers
- Resource-intensive (requires download ~100MB models)
- Single provider lock-in (fastembed library)
Example:
#![allow(unused)] fn main() { use fastembed::{TextEmbedding, InitOptions}; let model = TextEmbedding::try_new(InitOptions { model_name: "BAAI/bge-small-en-v1.5", ..Default::default() })?; let embeddings = model.embed(vec!["Hello world"], None)?; // Output: Vec<Vec<f32>> with 384 dimensions }
Option 2: Only Cloud AI Providers (OpenAI, Claude, etc.)
Pros:
- State-of-the-art embedding quality
- No local resource usage
- Latest models available
- Scalable to millions of documents
Cons:
- Requires API keys (cost per embedding)
- Network dependency (no offline mode)
- Privacy concerns (data sent to third parties)
- Vendor lock-in risk
Example:
#![allow(unused)] fn main() { use rig::providers::openai; let client = openai::Client::new("sk-..."); let embeddings = client.embeddings("text-embedding-3-small") .embed_documents(vec!["Hello world"]).await?; // Output: Vec<Vec<f32>> with 1536 dimensions }
Option 3: Hybrid Strategy (fastembed + AI providers via rig-core)
Pros:
- ✅ Best of both worlds: local dev, cloud production
- ✅ User choice: privacy-first or quality-first
- ✅ Cost flexibility: free for small projects, paid for scale
- ✅ Unified interface via
rig-corelibrary - ✅ Easy provider switching (config-driven)
Cons:
- ❌ More complex implementation (multiple providers)
- ❌ Dimension mismatch between providers (384 vs 1536)
- ❌ Additional dependencies (
rig-core,fastembed)
Decision
We will use a hybrid strategy: fastembed (local) + AI providers (via rig-core).
Implementation:
- Default:
fastembedwithBAAI/bge-small-en-v1.5(384 dimensions) - Optional: OpenAI, Claude, Ollama via
rig-core(configurable) - Interface:
EmbeddingProvidertrait abstracts provider details - Config-Driven: Provider selection via Nickel configuration
Architecture:
#![allow(unused)] fn main() { #[async_trait] pub trait EmbeddingProvider: Send + Sync { async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>>; fn dimensions(&self) -> usize; fn model_name(&self) -> &str; } // Local implementation pub struct FastEmbedProvider { model: TextEmbedding, } impl FastEmbedProvider { pub fn new(model_name: &str) -> Result<Self> { let model = TextEmbedding::try_new(InitOptions { model_name: model_name.into(), ..Default::default() })?; Ok(Self { model }) } } #[async_trait] impl EmbeddingProvider for FastEmbedProvider { async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> { Ok(self.model.embed(texts, None)?) } fn dimensions(&self) -> usize { 384 } fn model_name(&self) -> &str { "BAAI/bge-small-en-v1.5" } } // Cloud provider implementation (via rig-core) pub struct RigEmbeddingProvider { client: rig::Client, model: String, dimensions: usize, } #[async_trait] impl EmbeddingProvider for RigEmbeddingProvider { async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> { let embeddings = self.client .embeddings(&self.model) .embed_documents(texts) .await?; Ok(embeddings) } fn dimensions(&self) -> usize { self.dimensions } fn model_name(&self) -> &str { &self.model } } }
Configuration (Nickel):
# Local development (default)
{
embeddings = {
enabled = true,
provider = 'fastembed,
model = "BAAI/bge-small-en-v1.5",
dimensions = 384,
},
}
# Production with OpenAI
{
embeddings = {
enabled = true,
provider = 'openai,
model = "text-embedding-3-small",
dimensions = 1536,
api_key_env = "OPENAI_API_KEY",
},
}
# Self-hosted with Ollama
{
embeddings = {
enabled = true,
provider = 'ollama,
model = "nomic-embed-text",
dimensions = 768,
},
}
Provider Selection (kb-core/src/embeddings/mod.rs):
#![allow(unused)] fn main() { pub fn create_provider(config: &EmbeddingConfig) -> Result<Box<dyn EmbeddingProvider>> { match config.provider { EmbeddingProviderType::FastEmbed => { Ok(Box::new(FastEmbedProvider::new(&config.model)?)) } EmbeddingProviderType::OpenAI => { let api_key = std::env::var(&config.api_key_env)?; Ok(Box::new(RigEmbeddingProvider::new_openai(api_key, &config.model)?)) } EmbeddingProviderType::Claude => { let api_key = std::env::var(&config.api_key_env)?; Ok(Box::new(RigEmbeddingProvider::new_claude(api_key, &config.model)?)) } EmbeddingProviderType::Ollama => { Ok(Box::new(RigEmbeddingProvider::new_ollama(&config.model)?)) } } } }
Consequences
Positive
✅ Development Flexibility:
- Developers can use
fastembedwithout API keys - Fast feedback loop (local embeddings, no network calls)
- Works offline (train trips, flights)
✅ Production Quality:
- Production deployments can use OpenAI/Claude for better quality
- Latest embedding models available
- Scalable to millions of documents
✅ Privacy Control:
- Privacy-sensitive projects use local embeddings
- Public projects can use cloud providers
- User choice via configuration
✅ Cost Optimization:
- Small projects: free (fastembed)
- Large projects: pay for quality (cloud providers)
- Hybrid: important docs via cloud, bulk via local
✅ Unified Interface:
EmbeddingProvidertrait abstracts provider details- Query code doesn't know/care about provider
- Easy to add new providers
Negative
❌ Dimension Mismatch:
- fastembed: 384 dimensions
- OpenAI: 1536 dimensions
- Cannot mix in same index
Mitigation:
- Store provider + dimensions in node metadata
- Rebuild index when changing providers
- Document dimension constraints
❌ Model Download:
- First use of fastembed downloads ~100MB model
- Slow initial startup
Mitigation:
- Pre-download in Docker images
- Document model download in setup guide
- Cache models in
~/.cache/fastembed
❌ Complex Configuration:
- Multiple provider options may confuse users
Mitigation:
- Sane default (fastembed)
- Clear examples for each provider
- Validation errors explain misconfigurations
Neutral
⚪ Dependency Trade-off:
fastembedadds ~5MB to binaryrig-coreadds ~2MB- Total: ~7MB overhead
Not a concern for CLI/MCP server use case.
Provider Comparison
| Provider | Dimensions | Quality | Cost | Privacy | Offline |
|---|---|---|---|---|---|
| fastembed | 384 | Good | Free | ✅ Local | ✅ Yes |
| OpenAI | 1536 | Excellent | $0.0001/1K | ❌ Cloud | ❌ No |
| Claude | 1024 | Excellent | $0.00025/1K | ❌ Cloud | ❌ No |
| Ollama | 768 | Very Good | Free | ✅ Local | ✅ Yes |
Recommendation by Use Case:
- Development: fastembed (fast, free, offline)
- Small Teams: fastembed or Ollama (privacy, no costs)
- Enterprise: OpenAI or Claude (best quality, scalable)
- Self-Hosted: Ollama (good quality, local control)
Implementation Timeline
- ✅ Define
EmbeddingProvidertrait - ✅ Implement FastEmbedProvider (stub, feature-gated)
- ✅ Implement RigEmbeddingProvider (stub, feature-gated)
- ⏳ Complete FastEmbed integration with model download
- ⏳ Complete rig-core integration (OpenAI, Claude, Ollama)
- ⏳ Add query engine with similarity search
- ⏳ Document provider selection and trade-offs
Monitoring
Success Criteria:
- Users can switch providers via config change
- Local embeddings work without API keys
- Production deployments use cloud providers successfully
- Query quality acceptable for both local and cloud embeddings
Metrics:
- Embedding generation latency (local vs cloud)
- Query accuracy (precision@10 for semantic search)
- API costs (cloud providers)
- User satisfaction (feedback on search quality)
References
- fastembed Documentation
- rig-core Documentation
- OpenAI Embeddings API
- BAAI/bge Models
- Ollama Embeddings
Revision History
| Date | Author | Change |
|---|---|---|
| 2026-01-17 | Architecture Team | Initial decision |
Previous ADR: ADR-001: Nickel vs TOML Next ADR: ADR-003: Hybrid Storage Strategy