kogral/docs/architecture/adrs/002-fastembed-ai-providers.md

# ADR-002: FastEmbed via AI Providers for Embeddings

**Status**: Accepted

**Date**: 2026-01-17

**Deciders**: Architecture Team

**Context**: Embedding Strategy for Semantic Search

---

## Context

The KOGRAL requires embedding generation for semantic search capabilities. Embeddings convert text into numerical vectors that capture semantic meaning, enabling "find concepts" rather than just "find keywords".

**Requirements**:

1. **Local-First Option**: Must work offline without external API dependencies
2. **Production Scalability**: Support cloud AI providers for large-scale deployments
3. **Multiple Providers**: Flexibility to choose based on cost, quality, privacy
4. **Cost-Effective Development**: Free local embeddings for development and testing
5. **Quality**: Good enough embeddings for finding related concepts

**Options Evaluated**:

### Option 1: Only Local Embeddings (fastembed)

**Pros**:
- No API costs
- Works offline
- Privacy-preserving (no data leaves machine)
- Fast (local GPU acceleration possible)

**Cons**:
- Limited model quality compared to cloud providers
- Resource-intensive (requires download ~100MB models)
- Single provider lock-in (fastembed library)

**Example**:

```rust
use fastembed::{TextEmbedding, InitOptions};

let model = TextEmbedding::try_new(InitOptions {
    model_name: "BAAI/bge-small-en-v1.5",
    ..Default::default()
})?;

let embeddings = model.embed(vec!["Hello world"], None)?;
// Output: Vec<Vec<f32>> with 384 dimensions
```

### Option 2: Only Cloud AI Providers (OpenAI, Claude, etc.)

**Pros**:
- State-of-the-art embedding quality
- No local resource usage
- Latest models available
- Scalable to millions of documents

**Cons**:
- Requires API keys (cost per embedding)
- Network dependency (no offline mode)
- Privacy concerns (data sent to third parties)
- Vendor lock-in risk

**Example**:

```rust
use rig::providers::openai;

let client = openai::Client::new("sk-...");
let embeddings = client.embeddings("text-embedding-3-small")
    .embed_documents(vec!["Hello world"]).await?;
// Output: Vec<Vec<f32>> with 1536 dimensions
```

### Option 3: Hybrid Strategy (fastembed + AI providers via rig-core)

**Pros**:
- ✅ Best of both worlds: local dev, cloud production
- ✅ User choice: privacy-first or quality-first
- ✅ Cost flexibility: free for small projects, paid for scale
- ✅ Unified interface via `rig-core` library
- ✅ Easy provider switching (config-driven)

**Cons**:
- ❌ More complex implementation (multiple providers)
- ❌ Dimension mismatch between providers (384 vs 1536)
- ❌ Additional dependencies (`rig-core`, `fastembed`)

---

## Decision

**We will use a hybrid strategy: fastembed (local) + AI providers (via rig-core).**

**Implementation**:

1. **Default**: `fastembed` with `BAAI/bge-small-en-v1.5` (384 dimensions)
2. **Optional**: OpenAI, Claude, Ollama via `rig-core` (configurable)
3. **Interface**: `EmbeddingProvider` trait abstracts provider details
4. **Config-Driven**: Provider selection via Nickel configuration

**Architecture**:

```rust
#[async_trait]
pub trait EmbeddingProvider: Send + Sync {
    async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>>;
    fn dimensions(&self) -> usize;
    fn model_name(&self) -> &str;
}

// Local implementation
pub struct FastEmbedProvider {
    model: TextEmbedding,
}

impl FastEmbedProvider {
    pub fn new(model_name: &str) -> Result<Self> {
        let model = TextEmbedding::try_new(InitOptions {
            model_name: model_name.into(),
            ..Default::default()
        })?;
        Ok(Self { model })
    }
}

#[async_trait]
impl EmbeddingProvider for FastEmbedProvider {
    async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> {
        Ok(self.model.embed(texts, None)?)
    }

    fn dimensions(&self) -> usize { 384 }
    fn model_name(&self) -> &str { "BAAI/bge-small-en-v1.5" }
}

// Cloud provider implementation (via rig-core)
pub struct RigEmbeddingProvider {
    client: rig::Client,
    model: String,
    dimensions: usize,
}

#[async_trait]
impl EmbeddingProvider for RigEmbeddingProvider {
    async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> {
        let embeddings = self.client
            .embeddings(&self.model)
            .embed_documents(texts)
            .await?;
        Ok(embeddings)
    }

    fn dimensions(&self) -> usize { self.dimensions }
    fn model_name(&self) -> &str { &self.model }
}
```

**Configuration** (Nickel):

```nickel
# Local development (default)
{
  embeddings = {
    enabled = true,
    provider = 'fastembed,
    model = "BAAI/bge-small-en-v1.5",
    dimensions = 384,
  },
}

# Production with OpenAI
{
  embeddings = {
    enabled = true,
    provider = 'openai,
    model = "text-embedding-3-small",
    dimensions = 1536,
    api_key_env = "OPENAI_API_KEY",
  },
}

# Self-hosted with Ollama
{
  embeddings = {
    enabled = true,
    provider = 'ollama,
    model = "nomic-embed-text",
    dimensions = 768,
  },
}
```

**Provider Selection** (`kogral-core/src/embeddings/mod.rs`):

```rust
pub fn create_provider(config: &EmbeddingConfig) -> Result<Box<dyn EmbeddingProvider>> {
    match config.provider {
        EmbeddingProviderType::FastEmbed => {
            Ok(Box::new(FastEmbedProvider::new(&config.model)?))
        }
        EmbeddingProviderType::OpenAI => {
            let api_key = std::env::var(&config.api_key_env)?;
            Ok(Box::new(RigEmbeddingProvider::new_openai(api_key, &config.model)?))
        }
        EmbeddingProviderType::Claude => {
            let api_key = std::env::var(&config.api_key_env)?;
            Ok(Box::new(RigEmbeddingProvider::new_claude(api_key, &config.model)?))
        }
        EmbeddingProviderType::Ollama => {
            Ok(Box::new(RigEmbeddingProvider::new_ollama(&config.model)?))
        }
    }
}
```

---

## Consequences

### Positive

✅ **Development Flexibility**:
- Developers can use `fastembed` without API keys
- Fast feedback loop (local embeddings, no network calls)
- Works offline (train trips, flights)

✅ **Production Quality**:
- Production deployments can use OpenAI/Claude for better quality
- Latest embedding models available
- Scalable to millions of documents

✅ **Privacy Control**:
- Privacy-sensitive projects use local embeddings
- Public projects can use cloud providers
- User choice via configuration

✅ **Cost Optimization**:
- Small projects: free (fastembed)
- Large projects: pay for quality (cloud providers)
- Hybrid: important docs via cloud, bulk via local

✅ **Unified Interface**:
- `EmbeddingProvider` trait abstracts provider details
- Query code doesn't know/care about provider
- Easy to add new providers

### Negative

❌ **Dimension Mismatch**:
- fastembed: 384 dimensions
- OpenAI: 1536 dimensions
- Cannot mix in same index

**Mitigation**:
- Store provider + dimensions in node metadata
- Rebuild index when changing providers
- Document dimension constraints

❌ **Model Download**:
- First use of fastembed downloads ~100MB model
- Slow initial startup

**Mitigation**:
- Pre-download in Docker images
- Document model download in setup guide
- Cache models in `~/.cache/fastembed`

❌ **Complex Configuration**:
- Multiple provider options may confuse users

**Mitigation**:
- Sane default (fastembed)
- Clear examples for each provider
- Validation errors explain misconfigurations

### Neutral

⚪ **Dependency Trade-off**:
- `fastembed` adds ~5MB to binary
- `rig-core` adds ~2MB
- Total: ~7MB overhead

Not a concern for CLI/MCP server use case.

---

## Provider Comparison

| Provider | Dimensions | Quality | Cost | Privacy | Offline |
| -------------- | ---------- | --------- | ------------- | --------- | ------- |
| **fastembed** | 384 | Good | Free | ✅ Local | ✅ Yes |
| **OpenAI** | 1536 | Excellent | $0.0001/1K | ❌ Cloud | ❌ No |
| **Claude** | 1024 | Excellent | $0.00025/1K | ❌ Cloud | ❌ No |
| **Ollama** | 768 | Very Good | Free | ✅ Local | ✅ Yes |

**Recommendation by Use Case**:

- **Development**: fastembed (fast, free, offline)
- **Small Teams**: fastembed or Ollama (privacy, no costs)
- **Enterprise**: OpenAI or Claude (best quality, scalable)
- **Self-Hosted**: Ollama (good quality, local control)

---

## Implementation Timeline

1. ✅ Define `EmbeddingProvider` trait
2. ✅ Implement FastEmbedProvider (stub, feature-gated)
3. ✅ Implement RigEmbeddingProvider (stub, feature-gated)
4. ⏳ Complete FastEmbed integration with model download
5. ⏳ Complete rig-core integration (OpenAI, Claude, Ollama)
6. ⏳ Add query engine with similarity search
7. ⏳ Document provider selection and trade-offs

---

## Monitoring

**Success Criteria**:
- Users can switch providers via config change
- Local embeddings work without API keys
- Production deployments use cloud providers successfully
- Query quality acceptable for both local and cloud embeddings

**Metrics**:
- Embedding generation latency (local vs cloud)
- Query accuracy (precision@10 for semantic search)
- API costs (cloud providers)
- User satisfaction (feedback on search quality)

---

## References

- [fastembed Documentation](https://github.com/Anush008/fastembed-rs)
- [rig-core Documentation](https://github.com/0xPlaygrounds/rig)
- [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings)
- [BAAI/bge Models](https://huggingface.co/BAAI/bge-small-en-v1.5)
- [Ollama Embeddings](https://ollama.com/blog/embedding-models)

---

## Revision History

| Date | Author | Change |
| ---------- | ------------------ | ---------------- |
| 2026-01-17 | Architecture Team | Initial decision |

---

**Previous ADR**: [ADR-001: Nickel vs TOML](001-nickel-vs-toml.md)
**Next ADR**: [ADR-003: Hybrid Storage Strategy](003-hybrid-storage.md)
chore: add docs 2026-01-23 16:11:07 +00:00			`# ADR-002: FastEmbed via AI Providers for Embeddings`

			`Status: Accepted`

			`Date: 2026-01-17`

			`Deciders: Architecture Team`

			`Context: Embedding Strategy for Semantic Search`

			`---`

			`## Context`

			`The KOGRAL requires embedding generation for semantic search capabilities. Embeddings convert text into numerical vectors that capture semantic meaning, enabling "find concepts" rather than just "find keywords".`

			`Requirements:`

			`1. Local-First Option: Must work offline without external API dependencies`
			`2. Production Scalability: Support cloud AI providers for large-scale deployments`
			`3. Multiple Providers: Flexibility to choose based on cost, quality, privacy`
			`4. Cost-Effective Development: Free local embeddings for development and testing`
			`5. Quality: Good enough embeddings for finding related concepts`

			`Options Evaluated:`

			`### Option 1: Only Local Embeddings (fastembed)`

			`Pros:`
			`- No API costs`
			`- Works offline`
			`- Privacy-preserving (no data leaves machine)`
			`- Fast (local GPU acceleration possible)`

			`Cons:`
			`- Limited model quality compared to cloud providers`
			`- Resource-intensive (requires download ~100MB models)`
			`- Single provider lock-in (fastembed library)`

			`Example:`

			```rust
			`use fastembed::{TextEmbedding, InitOptions};`

			`let model = TextEmbedding::try_new(InitOptions {`
			`model_name: "BAAI/bge-small-en-v1.5",`
			`..Default::default()`
			`})?;`

			`let embeddings = model.embed(vec!["Hello world"], None)?;`
			`// Output: Vec<Vec<f32>> with 384 dimensions`
			```

			`### Option 2: Only Cloud AI Providers (OpenAI, Claude, etc.)`

			`Pros:`
			`- State-of-the-art embedding quality`
			`- No local resource usage`
			`- Latest models available`
			`- Scalable to millions of documents`

			`Cons:`
			`- Requires API keys (cost per embedding)`
			`- Network dependency (no offline mode)`
			`- Privacy concerns (data sent to third parties)`
			`- Vendor lock-in risk`

			`Example:`

			```rust
			`use rig::providers::openai;`

			`let client = openai::Client::new("sk-...");`
			`let embeddings = client.embeddings("text-embedding-3-small")`
			`.embed_documents(vec!["Hello world"]).await?;`
			`// Output: Vec<Vec<f32>> with 1536 dimensions`
			```

			`### Option 3: Hybrid Strategy (fastembed + AI providers via rig-core)`

			`Pros:`
			`- ✅ Best of both worlds: local dev, cloud production`
			`- ✅ User choice: privacy-first or quality-first`
			`- ✅ Cost flexibility: free for small projects, paid for scale`
			- ✅ Unified interface via `rig-core` library
			`- ✅ Easy provider switching (config-driven)`

			`Cons:`
			`- ❌ More complex implementation (multiple providers)`
			`- ❌ Dimension mismatch between providers (384 vs 1536)`
			- ❌ Additional dependencies (`rig-core`, `fastembed`)

			`---`

			`## Decision`

			`We will use a hybrid strategy: fastembed (local) + AI providers (via rig-core).`

			`Implementation:`

			1. Default: `fastembed` with `BAAI/bge-small-en-v1.5` (384 dimensions)
			2. Optional: OpenAI, Claude, Ollama via `rig-core` (configurable)
			3. Interface: `EmbeddingProvider` trait abstracts provider details
			`4. Config-Driven: Provider selection via Nickel configuration`

			`Architecture:`

			```rust
			`#[async_trait]`
			`pub trait EmbeddingProvider: Send + Sync {`
			`async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>>;`
			`fn dimensions(&self) -> usize;`
			`fn model_name(&self) -> &str;`
			`}`

			`// Local implementation`
			`pub struct FastEmbedProvider {`
			`model: TextEmbedding,`
			`}`

			`impl FastEmbedProvider {`
			`pub fn new(model_name: &str) -> Result<Self> {`
			`let model = TextEmbedding::try_new(InitOptions {`
			`model_name: model_name.into(),`
			`..Default::default()`
			`})?;`
			`Ok(Self { model })`
			`}`
			`}`

			`#[async_trait]`
			`impl EmbeddingProvider for FastEmbedProvider {`
			`async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> {`
			`Ok(self.model.embed(texts, None)?)`
			`}`

			`fn dimensions(&self) -> usize { 384 }`
			`fn model_name(&self) -> &str { "BAAI/bge-small-en-v1.5" }`
			`}`

			`// Cloud provider implementation (via rig-core)`
			`pub struct RigEmbeddingProvider {`
			`client: rig::Client,`
			`model: String,`
			`dimensions: usize,`
			`}`

			`#[async_trait]`
			`impl EmbeddingProvider for RigEmbeddingProvider {`
			`async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> {`
			`let embeddings = self.client`
			`.embeddings(&self.model)`
			`.embed_documents(texts)`
			`.await?;`
			`Ok(embeddings)`
			`}`

			`fn dimensions(&self) -> usize { self.dimensions }`
			`fn model_name(&self) -> &str { &self.model }`
			`}`
			```

			`Configuration (Nickel):`

			```nickel
			`# Local development (default)`
			`{`
			`embeddings = {`
			`enabled = true,`
			`provider = 'fastembed,`
			`model = "BAAI/bge-small-en-v1.5",`
			`dimensions = 384,`
			`},`
			`}`

			`# Production with OpenAI`
			`{`
			`embeddings = {`
			`enabled = true,`
			`provider = 'openai,`
			`model = "text-embedding-3-small",`
			`dimensions = 1536,`
			`api_key_env = "OPENAI_API_KEY",`
			`},`
			`}`

			`# Self-hosted with Ollama`
			`{`
			`embeddings = {`
			`enabled = true,`
			`provider = 'ollama,`
			`model = "nomic-embed-text",`
			`dimensions = 768,`
			`},`
			`}`
			```

			Provider Selection (`kogral-core/src/embeddings/mod.rs`):

			```rust
			`pub fn create_provider(config: &EmbeddingConfig) -> Result<Box<dyn EmbeddingProvider>> {`
			`match config.provider {`
			`EmbeddingProviderType::FastEmbed => {`
			`Ok(Box::new(FastEmbedProvider::new(&config.model)?))`
			`}`
			`EmbeddingProviderType::OpenAI => {`
			`let api_key = std::env::var(&config.api_key_env)?;`
			`Ok(Box::new(RigEmbeddingProvider::new_openai(api_key, &config.model)?))`
			`}`
			`EmbeddingProviderType::Claude => {`
			`let api_key = std::env::var(&config.api_key_env)?;`
			`Ok(Box::new(RigEmbeddingProvider::new_claude(api_key, &config.model)?))`
			`}`
			`EmbeddingProviderType::Ollama => {`
			`Ok(Box::new(RigEmbeddingProvider::new_ollama(&config.model)?))`
			`}`
			`}`
			`}`
			```

			`---`

			`## Consequences`

			`### Positive`

			`✅ Development Flexibility:`
			- Developers can use `fastembed` without API keys
			`- Fast feedback loop (local embeddings, no network calls)`
			`- Works offline (train trips, flights)`

			`✅ Production Quality:`
			`- Production deployments can use OpenAI/Claude for better quality`
			`- Latest embedding models available`
			`- Scalable to millions of documents`

			`✅ Privacy Control:`
			`- Privacy-sensitive projects use local embeddings`
			`- Public projects can use cloud providers`
			`- User choice via configuration`

			`✅ Cost Optimization:`
			`- Small projects: free (fastembed)`
			`- Large projects: pay for quality (cloud providers)`
			`- Hybrid: important docs via cloud, bulk via local`

			`✅ Unified Interface:`
			- `EmbeddingProvider` trait abstracts provider details
			`- Query code doesn't know/care about provider`
			`- Easy to add new providers`

			`### Negative`

			`❌ Dimension Mismatch:`
			`- fastembed: 384 dimensions`
			`- OpenAI: 1536 dimensions`
			`- Cannot mix in same index`

			`Mitigation:`
			`- Store provider + dimensions in node metadata`
			`- Rebuild index when changing providers`
			`- Document dimension constraints`

			`❌ Model Download:`
			`- First use of fastembed downloads ~100MB model`
			`- Slow initial startup`

			`Mitigation:`
			`- Pre-download in Docker images`
			`- Document model download in setup guide`
			- Cache models in `~/.cache/fastembed`

			`❌ Complex Configuration:`
			`- Multiple provider options may confuse users`

			`Mitigation:`
			`- Sane default (fastembed)`
			`- Clear examples for each provider`
			`- Validation errors explain misconfigurations`

			`### Neutral`

			`⚪ Dependency Trade-off:`
			- `fastembed` adds ~5MB to binary
			- `rig-core` adds ~2MB
			`- Total: ~7MB overhead`

			`Not a concern for CLI/MCP server use case.`

			`---`

			`## Provider Comparison`

			`\| Provider \| Dimensions \| Quality \| Cost \| Privacy \| Offline \|`
			`\| -------------- \| ---------- \| --------- \| ------------- \| --------- \| ------- \|`
			`\| fastembed \| 384 \| Good \| Free \| ✅ Local \| ✅ Yes \|`
			`\| OpenAI \| 1536 \| Excellent \| $0.0001/1K \| ❌ Cloud \| ❌ No \|`
			`\| Claude \| 1024 \| Excellent \| $0.00025/1K \| ❌ Cloud \| ❌ No \|`
			`\| Ollama \| 768 \| Very Good \| Free \| ✅ Local \| ✅ Yes \|`

			`Recommendation by Use Case:`

			`- Development: fastembed (fast, free, offline)`
			`- Small Teams: fastembed or Ollama (privacy, no costs)`
			`- Enterprise: OpenAI or Claude (best quality, scalable)`
			`- Self-Hosted: Ollama (good quality, local control)`

			`---`

			`## Implementation Timeline`

			1. ✅ Define `EmbeddingProvider` trait
			`2. ✅ Implement FastEmbedProvider (stub, feature-gated)`
			`3. ✅ Implement RigEmbeddingProvider (stub, feature-gated)`
			`4. ⏳ Complete FastEmbed integration with model download`
			`5. ⏳ Complete rig-core integration (OpenAI, Claude, Ollama)`
			`6. ⏳ Add query engine with similarity search`
			`7. ⏳ Document provider selection and trade-offs`

			`---`

			`## Monitoring`

			`Success Criteria:`
			`- Users can switch providers via config change`
			`- Local embeddings work without API keys`
			`- Production deployments use cloud providers successfully`
			`- Query quality acceptable for both local and cloud embeddings`

			`Metrics:`
			`- Embedding generation latency (local vs cloud)`
			`- Query accuracy (precision@10 for semantic search)`
			`- API costs (cloud providers)`
			`- User satisfaction (feedback on search quality)`

			`---`

			`## References`

			`- [fastembed Documentation](https://github.com/Anush008/fastembed-rs)`
			`- [rig-core Documentation](https://github.com/0xPlaygrounds/rig)`
			`- [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings)`
			`- [BAAI/bge Models](https://huggingface.co/BAAI/bge-small-en-v1.5)`
			`- [Ollama Embeddings](https://ollama.com/blog/embedding-models)`

			`---`

			`## Revision History`

			`\| Date \| Author \| Change \|`
			`\| ---------- \| ------------------ \| ---------------- \|`
			`\| 2026-01-17 \| Architecture Team \| Initial decision \|`

			`---`

			`Previous ADR: [ADR-001: Nickel vs TOML](001-nickel-vs-toml.md)`
			`Next ADR: [ADR-003: Hybrid Storage Strategy](003-hybrid-storage.md)`