Skip to content

feat: upgrade to qwen3-embed v2.0.0 — tiny local models + offline reranking #433

@n24q02m

Description

@n24q02m

Summary

Upstream qwen3-embed is releasing v2.0.0 with two major changes that affect this project:

  1. Smaller embedding model: tiny-embed-v1 (~60MB ONNX INT8) replaces Qwen3-Embedding-0.6B (573MB) — ~10x smaller, faster inference, multilingual 50+ languages + code support
  2. New local reranking: tiny-reranker-v1 (~70-90MB ONNX INT8) — cross-encoder reranker that runs entirely locally via ONNX Runtime

Impact on mnemo-mcp

Embedding (existing)

  • Model download size drops from ~573MB to ~60MB
  • Memory search quality maintained (target >= 90% of Qwen3-Embedding-8B teacher)
  • Output dimension remains 768 — backward compatible with existing memory indices
  • First-launch and cold-start times significantly reduced

Reranking (new capability)

  • Memory retrieval can now be reranked locally before returning results
  • Cross-encoder attention between query and memory content provides better relevance scoring
  • Particularly useful for distinguishing between semantically similar but contextually different memories
  • Zero network dependency for the entire memory search flow

What needs to change

  1. Bump qwen3-embed dependency to >=2.0.0
  2. Integrate reranking into memory search (after vector retrieval, before returning results)
  3. Test with multilingual memory content and code-related memories
  4. Verify embedding quality hasn't regressed
  5. Consider if reranking should be opt-in or default (latency trade-off: ~5-10ms per rerank call for 10 documents)

Technical details

Both models share the same backbone (mDeBERTa-v3-base pruned 6L, vocabulary pruned 250K → 64K tokens). Distilled from Qwen3-Embedding-8B (embed) and Qwen3-Reranker-8B (reranker). Apache-2.0 licensed, auto-downloaded from HuggingFace Hub.

Timeline

Blocked on qwen3-embed v2.0.0 release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions