kogral/docs/architecture/adrs/002-fastembed-ai-providers.md

# ADR-002: FastEmbed via AI Providers for Embeddings

**Status**: Accepted

**Date**: 2026-01-17

**Deciders**: Architecture Team

**Context**: Embedding Strategy for Semantic Search

---

## Context

The KOGRAL requires embedding generation for semantic search capabilities. Embeddings convert text into numerical vectors that capture semantic meaning, enabling "find concepts" rather than just "find keywords".

**Requirements**:

1. **Local-First Option**: Must work offline without external API dependencies
2. **Production Scalability**: Support cloud AI providers for large-scale deployments
3. **Multiple Providers**: Flexibility to choose based on cost, quality, privacy
4. **Cost-Effective Development**: Free local embeddings for development and testing
5. **Quality**: Good enough embeddings for finding related concepts

**Options Evaluated**:

### Option 1: Only Local Embeddings (fastembed)

**Pros**:
- No API costs
- Works offline
- Privacy-preserving (no data leaves machine)
- Fast (local GPU acceleration possible)

**Cons**:
- Limited model quality compared to cloud providers
- Resource-intensive (requires download ~100MB models)
- Single provider lock-in (fastembed library)

**Example**:

```rust
use fastembed::{TextEmbedding, InitOptions};

let model = TextEmbedding::try_new(InitOptions {
    model_name: "BAAI/bge-small-en-v1.5",
    ..Default::default()
})?;

let embeddings = model.embed(vec!["Hello world"], None)?;
// Output: Vec<Vec<f32>> with 384 dimensions
```

### Option 2: Only Cloud AI Providers (OpenAI, Claude, etc.)

**Pros**:
- State-of-the-art embedding quality
- No local resource usage
- Latest models available
- Scalable to millions of documents

**Cons**:
- Requires API keys (cost per embedding)
- Network dependency (no offline mode)
- Privacy concerns (data sent to third parties)
- Vendor lock-in risk

**Example**:

```rust
use rig::providers::openai;

let client = openai::Client::new("sk-...");
let embeddings = client.embeddings("text-embedding-3-small")
    .embed_documents(vec!["Hello world"]).await?;
// Output: Vec<Vec<f32>> with 1536 dimensions
```

### Option 3: Hybrid Strategy (fastembed + AI providers via rig-core)

**Pros**:
- ✅ Best of both worlds: local dev, cloud production
- ✅ User choice: privacy-first or quality-first
- ✅ Cost flexibility: free for small projects, paid for scale
- ✅ Unified interface via `rig-core` library
- ✅ Easy provider switching (config-driven)

**Cons**:
- ❌ More complex implementation (multiple providers)
- ❌ Dimension mismatch between providers (384 vs 1536)
- ❌ Additional dependencies (`rig-core`, `fastembed`)

---

## Decision

**We will use a hybrid strategy: fastembed (local) + AI providers (via rig-core).**

**Implementation**:

1. **Default**: `fastembed` with `BAAI/bge-small-en-v1.5` (384 dimensions)
2. **Optional**: OpenAI, Claude, Ollama via `rig-core` (configurable)
3. **Interface**: `EmbeddingProvider` trait abstracts provider details
4. **Config-Driven**: Provider selection via Nickel configuration

**Architecture**:

```rust
#[async_trait]
pub trait EmbeddingProvider: Send + Sync {
    async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>>;
    fn dimensions(&self) -> usize;
    fn model_name(&self) -> &str;
}

// Local implementation
pub struct FastEmbedProvider {
    model: TextEmbedding,
}

impl FastEmbedProvider {
    pub fn new(model_name: &str) -> Result<Self> {
        let model = TextEmbedding::try_new(InitOptions {
            model_name: model_name.into(),
            ..Default::default()
        })?;
        Ok(Self { model })
    }
}

#[async_trait]
impl EmbeddingProvider for FastEmbedProvider {
    async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> {
        Ok(self.model.embed(texts, None)?)
    }

    fn dimensions(&self) -> usize { 384 }
    fn model_name(&self) -> &str { "BAAI/bge-small-en-v1.5" }
}

// Cloud provider implementation (via rig-core)
pub struct RigEmbeddingProvider {
    client: rig::Client,
    model: String,
    dimensions: usize,
}

#[async_trait]
impl EmbeddingProvider for RigEmbeddingProvider {
    async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> {
        let embeddings = self.client
            .embeddings(&self.model)
            .embed_documents(texts)
            .await?;
        Ok(embeddings)
    }

    fn dimensions(&self) -> usize { self.dimensions }
    fn model_name(&self) -> &str { &self.model }
}
```

**Configuration** (Nickel):

```nickel
# Local development (default)
{
  embeddings = {
    enabled = true,
    provider = 'fastembed,
    model = "BAAI/bge-small-en-v1.5",
    dimensions = 384,
  },
}

# Production with OpenAI
{
  embeddings = {
    enabled = true,
    provider = 'openai,
    model = "text-embedding-3-small",
    dimensions = 1536,
    api_key_env = "OPENAI_API_KEY",
  },
}

# Self-hosted with Ollama
{
  embeddings = {
    enabled = true,
    provider = 'ollama,
    model = "nomic-embed-text",
    dimensions = 768,
  },
}
```

**Provider Selection** (`kogral-core/src/embeddings/mod.rs`):

```rust
pub fn create_provider(config: &EmbeddingConfig) -> Result<Box<dyn EmbeddingProvider>> {
    match config.provider {
        EmbeddingProviderType::FastEmbed => {
            Ok(Box::new(FastEmbedProvider::new(&config.model)?))
        }
        EmbeddingProviderType::OpenAI => {
            let api_key = std::env::var(&config.api_key_env)?;
            Ok(Box::new(RigEmbeddingProvider::new_openai(api_key, &config.model)?))
        }
        EmbeddingProviderType::Claude => {
            let api_key = std::env::var(&config.api_key_env)?;
            Ok(Box::new(RigEmbeddingProvider::new_claude(api_key, &config.model)?))
        }
        EmbeddingProviderType::Ollama => {
            Ok(Box::new(RigEmbeddingProvider::new_ollama(&config.model)?))
        }
    }
}
```

---

## Consequences

### Positive

✅ **Development Flexibility**:
- Developers can use `fastembed` without API keys
- Fast feedback loop (local embeddings, no network calls)
- Works offline (train trips, flights)

✅ **Production Quality**:
- Production deployments can use OpenAI/Claude for better quality
- Latest embedding models available
- Scalable to millions of documents

✅ **Privacy Control**:
- Privacy-sensitive projects use local embeddings
- Public projects can use cloud providers
- User choice via configuration

✅ **Cost Optimization**:
- Small projects: free (fastembed)
- Large projects: pay for quality (cloud providers)
- Hybrid: important docs via cloud, bulk via local

✅ **Unified Interface**:
- `EmbeddingProvider` trait abstracts provider details
- Query code doesn't know/care about provider
- Easy to add new providers

### Negative

❌ **Dimension Mismatch**:
- fastembed: 384 dimensions
- OpenAI: 1536 dimensions
- Cannot mix in same index

**Mitigation**:
- Store provider + dimensions in node metadata
- Rebuild index when changing providers
- Document dimension constraints

❌ **Model Download**:
- First use of fastembed downloads ~100MB model
- Slow initial startup

**Mitigation**:
- Pre-download in Docker images
- Document model download in setup guide
- Cache models in `~/.cache/fastembed`

❌ **Complex Configuration**:
- Multiple provider options may confuse users

**Mitigation**:
- Sane default (fastembed)
- Clear examples for each provider
- Validation errors explain misconfigurations

### Neutral

⚪ **Dependency Trade-off**:
- `fastembed` adds ~5MB to binary
- `rig-core` adds ~2MB
- Total: ~7MB overhead

Not a concern for CLI/MCP server use case.

---

## Provider Comparison

| Provider | Dimensions | Quality | Cost | Privacy | Offline |
| -------------- | ---------- | --------- | ------------- | --------- | ------- |
| **fastembed** | 384 | Good | Free | ✅ Local | ✅ Yes |
| **OpenAI** | 1536 | Excellent | $0.0001/1K | ❌ Cloud | ❌ No |
| **Claude** | 1024 | Excellent | $0.00025/1K | ❌ Cloud | ❌ No |
| **Ollama** | 768 | Very Good | Free | ✅ Local | ✅ Yes |

**Recommendation by Use Case**:

- **Development**: fastembed (fast, free, offline)
- **Small Teams**: fastembed or Ollama (privacy, no costs)
- **Enterprise**: OpenAI or Claude (best quality, scalable)
- **Self-Hosted**: Ollama (good quality, local control)

---

## Implementation Timeline

1. ✅ Define `EmbeddingProvider` trait
2. ✅ Implement FastEmbedProvider (stub, feature-gated)
3. ✅ Implement RigEmbeddingProvider (stub, feature-gated)
4. ⏳ Complete FastEmbed integration with model download
5. ⏳ Complete rig-core integration (OpenAI, Claude, Ollama)
6. ⏳ Add query engine with similarity search
7. ⏳ Document provider selection and trade-offs

---

## Monitoring

**Success Criteria**:
- Users can switch providers via config change
- Local embeddings work without API keys
- Production deployments use cloud providers successfully
- Query quality acceptable for both local and cloud embeddings

**Metrics**:
- Embedding generation latency (local vs cloud)
- Query accuracy (precision@10 for semantic search)
- API costs (cloud providers)
- User satisfaction (feedback on search quality)

---

## References

- [fastembed Documentation](https://github.com/Anush008/fastembed-rs)
- [rig-core Documentation](https://github.com/0xPlaygrounds/rig)
- [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings)
- [BAAI/bge Models](https://huggingface.co/BAAI/bge-small-en-v1.5)
- [Ollama Embeddings](https://ollama.com/blog/embedding-models)

---

## Revision History

| Date | Author | Change |
| ---------- | ------------------ | ---------------- |
| 2026-01-17 | Architecture Team | Initial decision |

---

**Previous ADR**: [ADR-001: Nickel vs TOML](001-nickel-vs-toml.md)
**Next ADR**: [ADR-003: Hybrid Storage Strategy](003-hybrid-storage.md)