You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
⚡️ Performance Bottleneck on Large-scale HNSW Querying (20M x 384D vectors)
I'm working on scaling approximate nearest neighbor search using HNSW in Spark on a large dataset of sentence embeddings.
✅ Setup Summary
Data: 20 million records, each with a
384-dimensional embedding (fromall-MiniLM-L6-v2)Cluster Config:
12 Executors, each with:
1 Driver with same config
⚙️ Code Snippet
❗ The Problem
💡 Questions
model.transform()so slow despite fast index construction?numPartitions(e.g. per executor/core)?🙏 Any ideas, tuning tips, or architectural suggestions are highly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions