kogral/docs/architecture/adrs/002-fastembed-ai-providers.md

358 lines
9.6 KiB
Markdown
Raw Normal View History

2026-01-23 16:11:07 +00:00
# ADR-002: FastEmbed via AI Providers for Embeddings
**Status**: Accepted
**Date**: 2026-01-17
**Deciders**: Architecture Team
**Context**: Embedding Strategy for Semantic Search
---
## Context
The KOGRAL requires embedding generation for semantic search capabilities. Embeddings convert text into numerical vectors that capture semantic meaning, enabling "find concepts" rather than just "find keywords".
**Requirements**:
1. **Local-First Option**: Must work offline without external API dependencies
2. **Production Scalability**: Support cloud AI providers for large-scale deployments
3. **Multiple Providers**: Flexibility to choose based on cost, quality, privacy
4. **Cost-Effective Development**: Free local embeddings for development and testing
5. **Quality**: Good enough embeddings for finding related concepts
**Options Evaluated**:
### Option 1: Only Local Embeddings (fastembed)
**Pros**:
- No API costs
- Works offline
- Privacy-preserving (no data leaves machine)
- Fast (local GPU acceleration possible)
**Cons**:
- Limited model quality compared to cloud providers
- Resource-intensive (requires download ~100MB models)
- Single provider lock-in (fastembed library)
**Example**:
```rust
use fastembed::{TextEmbedding, InitOptions};
let model = TextEmbedding::try_new(InitOptions {
model_name: "BAAI/bge-small-en-v1.5",
..Default::default()
})?;
let embeddings = model.embed(vec!["Hello world"], None)?;
// Output: Vec<Vec<f32>> with 384 dimensions
```
### Option 2: Only Cloud AI Providers (OpenAI, Claude, etc.)
**Pros**:
- State-of-the-art embedding quality
- No local resource usage
- Latest models available
- Scalable to millions of documents
**Cons**:
- Requires API keys (cost per embedding)
- Network dependency (no offline mode)
- Privacy concerns (data sent to third parties)
- Vendor lock-in risk
**Example**:
```rust
use rig::providers::openai;
let client = openai::Client::new("sk-...");
let embeddings = client.embeddings("text-embedding-3-small")
.embed_documents(vec!["Hello world"]).await?;
// Output: Vec<Vec<f32>> with 1536 dimensions
```
### Option 3: Hybrid Strategy (fastembed + AI providers via rig-core)
**Pros**:
- ✅ Best of both worlds: local dev, cloud production
- ✅ User choice: privacy-first or quality-first
- ✅ Cost flexibility: free for small projects, paid for scale
- ✅ Unified interface via `rig-core` library
- ✅ Easy provider switching (config-driven)
**Cons**:
- ❌ More complex implementation (multiple providers)
- ❌ Dimension mismatch between providers (384 vs 1536)
- ❌ Additional dependencies (`rig-core`, `fastembed`)
---
## Decision
**We will use a hybrid strategy: fastembed (local) + AI providers (via rig-core).**
**Implementation**:
1. **Default**: `fastembed` with `BAAI/bge-small-en-v1.5` (384 dimensions)
2. **Optional**: OpenAI, Claude, Ollama via `rig-core` (configurable)
3. **Interface**: `EmbeddingProvider` trait abstracts provider details
4. **Config-Driven**: Provider selection via Nickel configuration
**Architecture**:
```rust
#[async_trait]
pub trait EmbeddingProvider: Send + Sync {
async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>>;
fn dimensions(&self) -> usize;
fn model_name(&self) -> &str;
}
// Local implementation
pub struct FastEmbedProvider {
model: TextEmbedding,
}
impl FastEmbedProvider {
pub fn new(model_name: &str) -> Result<Self> {
let model = TextEmbedding::try_new(InitOptions {
model_name: model_name.into(),
..Default::default()
})?;
Ok(Self { model })
}
}
#[async_trait]
impl EmbeddingProvider for FastEmbedProvider {
async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> {
Ok(self.model.embed(texts, None)?)
}
fn dimensions(&self) -> usize { 384 }
fn model_name(&self) -> &str { "BAAI/bge-small-en-v1.5" }
}
// Cloud provider implementation (via rig-core)
pub struct RigEmbeddingProvider {
client: rig::Client,
model: String,
dimensions: usize,
}
#[async_trait]
impl EmbeddingProvider for RigEmbeddingProvider {
async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> {
let embeddings = self.client
.embeddings(&self.model)
.embed_documents(texts)
.await?;
Ok(embeddings)
}
fn dimensions(&self) -> usize { self.dimensions }
fn model_name(&self) -> &str { &self.model }
}
```
**Configuration** (Nickel):
```nickel
# Local development (default)
{
embeddings = {
enabled = true,
provider = 'fastembed,
model = "BAAI/bge-small-en-v1.5",
dimensions = 384,
},
}
# Production with OpenAI
{
embeddings = {
enabled = true,
provider = 'openai,
model = "text-embedding-3-small",
dimensions = 1536,
api_key_env = "OPENAI_API_KEY",
},
}
# Self-hosted with Ollama
{
embeddings = {
enabled = true,
provider = 'ollama,
model = "nomic-embed-text",
dimensions = 768,
},
}
```
**Provider Selection** (`kogral-core/src/embeddings/mod.rs`):
```rust
pub fn create_provider(config: &EmbeddingConfig) -> Result<Box<dyn EmbeddingProvider>> {
match config.provider {
EmbeddingProviderType::FastEmbed => {
Ok(Box::new(FastEmbedProvider::new(&config.model)?))
}
EmbeddingProviderType::OpenAI => {
let api_key = std::env::var(&config.api_key_env)?;
Ok(Box::new(RigEmbeddingProvider::new_openai(api_key, &config.model)?))
}
EmbeddingProviderType::Claude => {
let api_key = std::env::var(&config.api_key_env)?;
Ok(Box::new(RigEmbeddingProvider::new_claude(api_key, &config.model)?))
}
EmbeddingProviderType::Ollama => {
Ok(Box::new(RigEmbeddingProvider::new_ollama(&config.model)?))
}
}
}
```
---
## Consequences
### Positive
**Development Flexibility**:
- Developers can use `fastembed` without API keys
- Fast feedback loop (local embeddings, no network calls)
- Works offline (train trips, flights)
**Production Quality**:
- Production deployments can use OpenAI/Claude for better quality
- Latest embedding models available
- Scalable to millions of documents
**Privacy Control**:
- Privacy-sensitive projects use local embeddings
- Public projects can use cloud providers
- User choice via configuration
**Cost Optimization**:
- Small projects: free (fastembed)
- Large projects: pay for quality (cloud providers)
- Hybrid: important docs via cloud, bulk via local
**Unified Interface**:
- `EmbeddingProvider` trait abstracts provider details
- Query code doesn't know/care about provider
- Easy to add new providers
### Negative
**Dimension Mismatch**:
- fastembed: 384 dimensions
- OpenAI: 1536 dimensions
- Cannot mix in same index
**Mitigation**:
- Store provider + dimensions in node metadata
- Rebuild index when changing providers
- Document dimension constraints
**Model Download**:
- First use of fastembed downloads ~100MB model
- Slow initial startup
**Mitigation**:
- Pre-download in Docker images
- Document model download in setup guide
- Cache models in `~/.cache/fastembed`
**Complex Configuration**:
- Multiple provider options may confuse users
**Mitigation**:
- Sane default (fastembed)
- Clear examples for each provider
- Validation errors explain misconfigurations
### Neutral
**Dependency Trade-off**:
- `fastembed` adds ~5MB to binary
- `rig-core` adds ~2MB
- Total: ~7MB overhead
Not a concern for CLI/MCP server use case.
---
## Provider Comparison
| Provider | Dimensions | Quality | Cost | Privacy | Offline |
| -------------- | ---------- | --------- | ------------- | --------- | ------- |
| **fastembed** | 384 | Good | Free | ✅ Local | ✅ Yes |
| **OpenAI** | 1536 | Excellent | $0.0001/1K | ❌ Cloud | ❌ No |
| **Claude** | 1024 | Excellent | $0.00025/1K | ❌ Cloud | ❌ No |
| **Ollama** | 768 | Very Good | Free | ✅ Local | ✅ Yes |
**Recommendation by Use Case**:
- **Development**: fastembed (fast, free, offline)
- **Small Teams**: fastembed or Ollama (privacy, no costs)
- **Enterprise**: OpenAI or Claude (best quality, scalable)
- **Self-Hosted**: Ollama (good quality, local control)
---
## Implementation Timeline
1. ✅ Define `EmbeddingProvider` trait
2. ✅ Implement FastEmbedProvider (stub, feature-gated)
3. ✅ Implement RigEmbeddingProvider (stub, feature-gated)
4. ⏳ Complete FastEmbed integration with model download
5. ⏳ Complete rig-core integration (OpenAI, Claude, Ollama)
6. ⏳ Add query engine with similarity search
7. ⏳ Document provider selection and trade-offs
---
## Monitoring
**Success Criteria**:
- Users can switch providers via config change
- Local embeddings work without API keys
- Production deployments use cloud providers successfully
- Query quality acceptable for both local and cloud embeddings
**Metrics**:
- Embedding generation latency (local vs cloud)
- Query accuracy (precision@10 for semantic search)
- API costs (cloud providers)
- User satisfaction (feedback on search quality)
---
## References
- [fastembed Documentation](https://github.com/Anush008/fastembed-rs)
- [rig-core Documentation](https://github.com/0xPlaygrounds/rig)
- [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings)
- [BAAI/bge Models](https://huggingface.co/BAAI/bge-small-en-v1.5)
- [Ollama Embeddings](https://ollama.com/blog/embedding-models)
---
## Revision History
| Date | Author | Change |
| ---------- | ------------------ | ---------------- |
| 2026-01-17 | Architecture Team | Initial decision |
---
**Previous ADR**: [ADR-001: Nickel vs TOML](001-nickel-vs-toml.md)
**Next ADR**: [ADR-003: Hybrid Storage Strategy](003-hybrid-storage.md)