-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
Description
Peeking at the last nightly benchmark data point, I see the top CPU hotspots during indexing.
That 2nd one (13.12% in ramBytesUsed) is concerning ... I know we need to account properly for RAM so IW can flush when RAM exceeds its allowance ... but maybe we can optimize how we do that for HNSW?
8.63% spent clearing bitsets for HNSW searching is also scary -- that likely impacts search performance too (since building an HNSW graph is done by doing a search for each inserted vector)?
Also what exactly is reduceLanesTemplate? I find this name very non-intuitive :) Is it essentially a cast (like long -> int) for a vector?
Profiler for cpu:
WARNING: Using incubator modules: jdk.incubator.vector
PROFILE SUMMARY from 4433882 events (total: 4M)
tests.profile.mode=cpu
tests.profile.count=50
tests.profile.stacksize=4
tests.profile.linenumbers=false
PERCENT CPU SAMPLES STACK
23.57% 1M jdk.incubator.vector.FloatVector#reduceLanesTemplate() [Inlined code]
at jdk.incubator.vector.Float256Vector#reduceLanes() [Inlined code]
at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody() [JIT compiled code]
at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProduct() [Inlined code]
13.12% 581719 org.apache.lucene.util.hnsw.NeighborArray#ramBytesUsed() [Inlined code]
at org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT compiled code]
at org.apache.lucene.util.hnsw.OnHeapHnswGraph#addNode() [JIT compiled code]
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
9.51% 421729 org.apache.lucene.index.FloatVectorValues$1#vectorValue() [Inlined code]
at org.apache.lucene.codecs.hnsw.DefaultFlatVectorScorer$FloatScoringSupplier$1#score() [Inlined code]
at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
8.63% 382600 java.util.Arrays#fill() [Inlined code]
at org.apache.lucene.util.FixedBitSet#clear() [Inlined code]
at org.apache.lucene.util.hnsw.HnswGraphSearcher#prepareScratchState() [Inlined code]
at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
4.55% 201882 org.apache.lucene.util.FixedBitSet#getAndSet() [Inlined code]
at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() [Inlined code]
3.12% 138341 org.apache.lucene.util.RamUsageEstimator#sizeOf() [Inlined code]
at org.apache.lucene.internal.hppc.MaxSizedIntArrayList#ramBytesUsed() [Inlined code]
at org.apache.lucene.util.hnsw.NeighborArray#ramBytesUsed() [Inlined code]
at org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT compiled code]
2.60% 115254 org.apache.lucene.util.hnsw.HnswConcurrentMergeBuilder$MergeSearcher#graphSeek() [JIT compiled code]
at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() [Inlined code]
2.42% 107338 org.apache.lucene.internal.hppc.MaxSizedIntArrayList#ramBytesUsed() [Inlined code]
at org.apache.lucene.util.hnsw.NeighborArray#ramBytesUsed() [Inlined code]
at org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT compiled code]
at org.apache.lucene.util.hnsw.OnHeapHnswGraph#addNode() [JIT compiled code]
2.36% 104842 org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody() [JIT compiled code]
at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProduct() [Inlined code]
at org.apache.lucene.util.VectorUtil#dotProduct() [Inlined code]
at org.apache.lucene.index.VectorSimilarityFunction$2#compare() [Inlined code]
2.36% 104836 org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT compiled code]
at org.apache.lucene.util.hnsw.OnHeapHnswGraph#addNode() [JIT compiled code]
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() [Inlined code]
2.21% 97814 org.apache.lucene.util.hnsw.OnHeapHnswGraph#getNeighbors() [Inlined code]
at org.apache.lucene.util.hnsw.HnswConcurrentMergeBuilder$MergeSearcher#graphSeek() [JIT compiled code]
at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
1.55% 68929 java.util.concurrent.locks.AbstractQueuedSynchronizer#apparentlyFirstQueuedIsExclusive() [Inlined code]
at java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync#readerShouldBlock() [Inlined code]
at java.util.concurrent.locks.ReentrantReadWriteLock$Sync#tryAcquireShared() [Inlined code]
at java.util.concurrent.locks.AbstractQueuedSynchronizer#acquireShared() [Inlined code]
Reactions are currently unavailable