358 lines
9.6 KiB
Markdown
358 lines
9.6 KiB
Markdown
# ADR-002: FastEmbed via AI Providers for Embeddings
|
|
|
|
**Status**: Accepted
|
|
|
|
**Date**: 2026-01-17
|
|
|
|
**Deciders**: Architecture Team
|
|
|
|
**Context**: Embedding Strategy for Semantic Search
|
|
|
|
---
|
|
|
|
## Context
|
|
|
|
The KOGRAL requires embedding generation for semantic search capabilities. Embeddings convert text into numerical vectors that capture semantic meaning, enabling "find concepts" rather than just "find keywords".
|
|
|
|
**Requirements**:
|
|
|
|
1. **Local-First Option**: Must work offline without external API dependencies
|
|
2. **Production Scalability**: Support cloud AI providers for large-scale deployments
|
|
3. **Multiple Providers**: Flexibility to choose based on cost, quality, privacy
|
|
4. **Cost-Effective Development**: Free local embeddings for development and testing
|
|
5. **Quality**: Good enough embeddings for finding related concepts
|
|
|
|
**Options Evaluated**:
|
|
|
|
### Option 1: Only Local Embeddings (fastembed)
|
|
|
|
**Pros**:
|
|
- No API costs
|
|
- Works offline
|
|
- Privacy-preserving (no data leaves machine)
|
|
- Fast (local GPU acceleration possible)
|
|
|
|
**Cons**:
|
|
- Limited model quality compared to cloud providers
|
|
- Resource-intensive (requires download ~100MB models)
|
|
- Single provider lock-in (fastembed library)
|
|
|
|
**Example**:
|
|
|
|
```rust
|
|
use fastembed::{TextEmbedding, InitOptions};
|
|
|
|
let model = TextEmbedding::try_new(InitOptions {
|
|
model_name: "BAAI/bge-small-en-v1.5",
|
|
..Default::default()
|
|
})?;
|
|
|
|
let embeddings = model.embed(vec!["Hello world"], None)?;
|
|
// Output: Vec<Vec<f32>> with 384 dimensions
|
|
```
|
|
|
|
### Option 2: Only Cloud AI Providers (OpenAI, Claude, etc.)
|
|
|
|
**Pros**:
|
|
- State-of-the-art embedding quality
|
|
- No local resource usage
|
|
- Latest models available
|
|
- Scalable to millions of documents
|
|
|
|
**Cons**:
|
|
- Requires API keys (cost per embedding)
|
|
- Network dependency (no offline mode)
|
|
- Privacy concerns (data sent to third parties)
|
|
- Vendor lock-in risk
|
|
|
|
**Example**:
|
|
|
|
```rust
|
|
use rig::providers::openai;
|
|
|
|
let client = openai::Client::new("sk-...");
|
|
let embeddings = client.embeddings("text-embedding-3-small")
|
|
.embed_documents(vec!["Hello world"]).await?;
|
|
// Output: Vec<Vec<f32>> with 1536 dimensions
|
|
```
|
|
|
|
### Option 3: Hybrid Strategy (fastembed + AI providers via rig-core)
|
|
|
|
**Pros**:
|
|
- ✅ Best of both worlds: local dev, cloud production
|
|
- ✅ User choice: privacy-first or quality-first
|
|
- ✅ Cost flexibility: free for small projects, paid for scale
|
|
- ✅ Unified interface via `rig-core` library
|
|
- ✅ Easy provider switching (config-driven)
|
|
|
|
**Cons**:
|
|
- ❌ More complex implementation (multiple providers)
|
|
- ❌ Dimension mismatch between providers (384 vs 1536)
|
|
- ❌ Additional dependencies (`rig-core`, `fastembed`)
|
|
|
|
---
|
|
|
|
## Decision
|
|
|
|
**We will use a hybrid strategy: fastembed (local) + AI providers (via rig-core).**
|
|
|
|
**Implementation**:
|
|
|
|
1. **Default**: `fastembed` with `BAAI/bge-small-en-v1.5` (384 dimensions)
|
|
2. **Optional**: OpenAI, Claude, Ollama via `rig-core` (configurable)
|
|
3. **Interface**: `EmbeddingProvider` trait abstracts provider details
|
|
4. **Config-Driven**: Provider selection via Nickel configuration
|
|
|
|
**Architecture**:
|
|
|
|
```rust
|
|
#[async_trait]
|
|
pub trait EmbeddingProvider: Send + Sync {
|
|
async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>>;
|
|
fn dimensions(&self) -> usize;
|
|
fn model_name(&self) -> &str;
|
|
}
|
|
|
|
// Local implementation
|
|
pub struct FastEmbedProvider {
|
|
model: TextEmbedding,
|
|
}
|
|
|
|
impl FastEmbedProvider {
|
|
pub fn new(model_name: &str) -> Result<Self> {
|
|
let model = TextEmbedding::try_new(InitOptions {
|
|
model_name: model_name.into(),
|
|
..Default::default()
|
|
})?;
|
|
Ok(Self { model })
|
|
}
|
|
}
|
|
|
|
#[async_trait]
|
|
impl EmbeddingProvider for FastEmbedProvider {
|
|
async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> {
|
|
Ok(self.model.embed(texts, None)?)
|
|
}
|
|
|
|
fn dimensions(&self) -> usize { 384 }
|
|
fn model_name(&self) -> &str { "BAAI/bge-small-en-v1.5" }
|
|
}
|
|
|
|
// Cloud provider implementation (via rig-core)
|
|
pub struct RigEmbeddingProvider {
|
|
client: rig::Client,
|
|
model: String,
|
|
dimensions: usize,
|
|
}
|
|
|
|
#[async_trait]
|
|
impl EmbeddingProvider for RigEmbeddingProvider {
|
|
async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> {
|
|
let embeddings = self.client
|
|
.embeddings(&self.model)
|
|
.embed_documents(texts)
|
|
.await?;
|
|
Ok(embeddings)
|
|
}
|
|
|
|
fn dimensions(&self) -> usize { self.dimensions }
|
|
fn model_name(&self) -> &str { &self.model }
|
|
}
|
|
```
|
|
|
|
**Configuration** (Nickel):
|
|
|
|
```nickel
|
|
# Local development (default)
|
|
{
|
|
embeddings = {
|
|
enabled = true,
|
|
provider = 'fastembed,
|
|
model = "BAAI/bge-small-en-v1.5",
|
|
dimensions = 384,
|
|
},
|
|
}
|
|
|
|
# Production with OpenAI
|
|
{
|
|
embeddings = {
|
|
enabled = true,
|
|
provider = 'openai,
|
|
model = "text-embedding-3-small",
|
|
dimensions = 1536,
|
|
api_key_env = "OPENAI_API_KEY",
|
|
},
|
|
}
|
|
|
|
# Self-hosted with Ollama
|
|
{
|
|
embeddings = {
|
|
enabled = true,
|
|
provider = 'ollama,
|
|
model = "nomic-embed-text",
|
|
dimensions = 768,
|
|
},
|
|
}
|
|
```
|
|
|
|
**Provider Selection** (`kogral-core/src/embeddings/mod.rs`):
|
|
|
|
```rust
|
|
pub fn create_provider(config: &EmbeddingConfig) -> Result<Box<dyn EmbeddingProvider>> {
|
|
match config.provider {
|
|
EmbeddingProviderType::FastEmbed => {
|
|
Ok(Box::new(FastEmbedProvider::new(&config.model)?))
|
|
}
|
|
EmbeddingProviderType::OpenAI => {
|
|
let api_key = std::env::var(&config.api_key_env)?;
|
|
Ok(Box::new(RigEmbeddingProvider::new_openai(api_key, &config.model)?))
|
|
}
|
|
EmbeddingProviderType::Claude => {
|
|
let api_key = std::env::var(&config.api_key_env)?;
|
|
Ok(Box::new(RigEmbeddingProvider::new_claude(api_key, &config.model)?))
|
|
}
|
|
EmbeddingProviderType::Ollama => {
|
|
Ok(Box::new(RigEmbeddingProvider::new_ollama(&config.model)?))
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
✅ **Development Flexibility**:
|
|
- Developers can use `fastembed` without API keys
|
|
- Fast feedback loop (local embeddings, no network calls)
|
|
- Works offline (train trips, flights)
|
|
|
|
✅ **Production Quality**:
|
|
- Production deployments can use OpenAI/Claude for better quality
|
|
- Latest embedding models available
|
|
- Scalable to millions of documents
|
|
|
|
✅ **Privacy Control**:
|
|
- Privacy-sensitive projects use local embeddings
|
|
- Public projects can use cloud providers
|
|
- User choice via configuration
|
|
|
|
✅ **Cost Optimization**:
|
|
- Small projects: free (fastembed)
|
|
- Large projects: pay for quality (cloud providers)
|
|
- Hybrid: important docs via cloud, bulk via local
|
|
|
|
✅ **Unified Interface**:
|
|
- `EmbeddingProvider` trait abstracts provider details
|
|
- Query code doesn't know/care about provider
|
|
- Easy to add new providers
|
|
|
|
### Negative
|
|
|
|
❌ **Dimension Mismatch**:
|
|
- fastembed: 384 dimensions
|
|
- OpenAI: 1536 dimensions
|
|
- Cannot mix in same index
|
|
|
|
**Mitigation**:
|
|
- Store provider + dimensions in node metadata
|
|
- Rebuild index when changing providers
|
|
- Document dimension constraints
|
|
|
|
❌ **Model Download**:
|
|
- First use of fastembed downloads ~100MB model
|
|
- Slow initial startup
|
|
|
|
**Mitigation**:
|
|
- Pre-download in Docker images
|
|
- Document model download in setup guide
|
|
- Cache models in `~/.cache/fastembed`
|
|
|
|
❌ **Complex Configuration**:
|
|
- Multiple provider options may confuse users
|
|
|
|
**Mitigation**:
|
|
- Sane default (fastembed)
|
|
- Clear examples for each provider
|
|
- Validation errors explain misconfigurations
|
|
|
|
### Neutral
|
|
|
|
⚪ **Dependency Trade-off**:
|
|
- `fastembed` adds ~5MB to binary
|
|
- `rig-core` adds ~2MB
|
|
- Total: ~7MB overhead
|
|
|
|
Not a concern for CLI/MCP server use case.
|
|
|
|
---
|
|
|
|
## Provider Comparison
|
|
|
|
| Provider | Dimensions | Quality | Cost | Privacy | Offline |
|
|
| -------------- | ---------- | --------- | ------------- | --------- | ------- |
|
|
| **fastembed** | 384 | Good | Free | ✅ Local | ✅ Yes |
|
|
| **OpenAI** | 1536 | Excellent | $0.0001/1K | ❌ Cloud | ❌ No |
|
|
| **Claude** | 1024 | Excellent | $0.00025/1K | ❌ Cloud | ❌ No |
|
|
| **Ollama** | 768 | Very Good | Free | ✅ Local | ✅ Yes |
|
|
|
|
**Recommendation by Use Case**:
|
|
|
|
- **Development**: fastembed (fast, free, offline)
|
|
- **Small Teams**: fastembed or Ollama (privacy, no costs)
|
|
- **Enterprise**: OpenAI or Claude (best quality, scalable)
|
|
- **Self-Hosted**: Ollama (good quality, local control)
|
|
|
|
---
|
|
|
|
## Implementation Timeline
|
|
|
|
1. ✅ Define `EmbeddingProvider` trait
|
|
2. ✅ Implement FastEmbedProvider (stub, feature-gated)
|
|
3. ✅ Implement RigEmbeddingProvider (stub, feature-gated)
|
|
4. ⏳ Complete FastEmbed integration with model download
|
|
5. ⏳ Complete rig-core integration (OpenAI, Claude, Ollama)
|
|
6. ⏳ Add query engine with similarity search
|
|
7. ⏳ Document provider selection and trade-offs
|
|
|
|
---
|
|
|
|
## Monitoring
|
|
|
|
**Success Criteria**:
|
|
- Users can switch providers via config change
|
|
- Local embeddings work without API keys
|
|
- Production deployments use cloud providers successfully
|
|
- Query quality acceptable for both local and cloud embeddings
|
|
|
|
**Metrics**:
|
|
- Embedding generation latency (local vs cloud)
|
|
- Query accuracy (precision@10 for semantic search)
|
|
- API costs (cloud providers)
|
|
- User satisfaction (feedback on search quality)
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [fastembed Documentation](https://github.com/Anush008/fastembed-rs)
|
|
- [rig-core Documentation](https://github.com/0xPlaygrounds/rig)
|
|
- [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings)
|
|
- [BAAI/bge Models](https://huggingface.co/BAAI/bge-small-en-v1.5)
|
|
- [Ollama Embeddings](https://ollama.com/blog/embedding-models)
|
|
|
|
---
|
|
|
|
## Revision History
|
|
|
|
| Date | Author | Change |
|
|
| ---------- | ------------------ | ---------------- |
|
|
| 2026-01-17 | Architecture Team | Initial decision |
|
|
|
|
---
|
|
|
|
**Previous ADR**: [ADR-001: Nickel vs TOML](001-nickel-vs-toml.md)
|
|
**Next ADR**: [ADR-003: Hybrid Storage Strategy](003-hybrid-storage.md)
|