feat: Add nomic-embed-text-v2-moe support via Candle backend

## Summary

Nomic released [nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) in February 2025 — the first general-purpose Mixture of Experts embedding model. It's been out for a year, outperforms v1.5 on BEIR and MIRACL, supports ~100 languages, and keeps the same 768-dim Matryoshka output. No one has added it to a Rust embedding library yet.

I'd like to implement this for fastembed-rs and am happy to open a PR. Posting this issue first to align on the approach.

## Why Candle, not ONNX

The v2-moe architecture uses dynamic expert routing (8 experts, top-2 per token) that cannot be cleanly exported to ONNX. The MoE gating layer calls `.tolist()` and uses Python control flow to dispatch tokens to experts — this gets baked as constants during JIT tracing, producing incorrect results on different inputs.

The only known ONNX workaround ([documented here](https://skeptric.com/onnx-moe/)) runs **all 8 experts unconditionally** on every token, then masks. It works, but at ~4x the compute cost — defeating the purpose of MoE.

HuggingFace's TEI solved this the right way: a native Candle implementation with proper MoE routing ([PR #596](https://github.com/huggingface/text-embeddings-inference/pull/596), merged April 2025).

## Proposed approach

Follow the **Qwen3 precedent** ([PR #216](https://github.com/Anush008/fastembed-rs/pull/216)):

1. New file `src/models/nomic_v2_moe.rs` implementing the full NomicBert+MoE architecture in candle-nn:
   - NomicBert embeddings + RoPE
   - Multi-head attention with QKV bias
   - Standard MLP (non-MoE layers)
   - MoE layer: linear router → softmax → top-2 selection → 8 expert MLPs → weighted sum
   - Alternating standard/MoE layers (MoE every 2nd layer)
   - Mean pooling + L2 normalization
2. Feature-gated behind `nomic-v2-moe` (reuses existing candle deps from `qwen3`)
3. Loads directly from safetensors on HuggingFace — no custom ONNX export needed
4. Tests validated against PyTorch reference outputs (cosine similarity > 0.999)

## Model specs

| | v1.5 (current) | v2-moe (proposed) |
|---|---|---|
| Architecture | Standard transformer | MoE (8 experts, top-2) |
| Total params | 137M | 475M |
| Active params | 137M | 305M |
| Dimensions | 768 (Matryoshka) | 768 (Matryoshka) |
| Max context | 8192 tokens | 512 tokens |
| Languages | English-focused | ~100 |
| HF format | ONNX available | Safetensors only |

## References

- [Model card](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe)
- [Paper (arXiv:2502.07972)](https://arxiv.org/abs/2502.07972)
- [TEI's candle implementation (PR #596)](https://github.com/huggingface/text-embeddings-inference/pull/596)
- [Qwen3 candle precedent in this repo (PR #216)](https://github.com/Anush008/fastembed-rs/pull/216)
- [Why ONNX export fails for MoE](https://skeptric.com/onnx-moe/)

## Scope

I'm prepared to implement this and open a PR. Wanted to check:

1. Does the Candle approach align with where you want the library to go?
2. Any preference on the feature flag naming (`nomic-v2-moe`, `nomic-moe`, etc.)?
3. Should I target the same candle version pinned by the `qwen3` feature?

Thanks for maintaining this library — it's the backbone of local embeddings in Rust.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add nomic-embed-text-v2-moe support via Candle backend #227

Summary

Why Candle, not ONNX

Proposed approach

Model specs

References

Scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	v1.5 (current)	v2-moe (proposed)
Architecture	Standard transformer	MoE (8 experts, top-2)
Total params	137M	475M
Active params	137M	305M
Dimensions	768 (Matryoshka)	768 (Matryoshka)
Max context	8192 tokens	512 tokens
Languages	English-focused	~100
HF format	ONNX available	Safetensors only

feat: Add nomic-embed-text-v2-moe support via Candle backend #227

Description

Summary

Why Candle, not ONNX

Proposed approach

Model specs

References

Scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions