Skip to content

Are we spending too much CPU in HNSW ramBytesUsed, and clearing bitsets? #14763

@mikemccand

Description

@mikemccand

Peeking at the last nightly benchmark data point, I see the top CPU hotspots during indexing.

That 2nd one (13.12% in ramBytesUsed) is concerning ... I know we need to account properly for RAM so IW can flush when RAM exceeds its allowance ... but maybe we can optimize how we do that for HNSW?

8.63% spent clearing bitsets for HNSW searching is also scary -- that likely impacts search performance too (since building an HNSW graph is done by doing a search for each inserted vector)?

Also what exactly is reduceLanesTemplate? I find this name very non-intuitive :) Is it essentially a cast (like long -> int) for a vector?

Profiler for cpu:
WARNING: Using incubator modules: jdk.incubator.vector
PROFILE SUMMARY from 4433882 events (total: 4M)
  tests.profile.mode=cpu
  tests.profile.count=50
  tests.profile.stacksize=4
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
23.57%        1M            jdk.incubator.vector.FloatVector#reduceLanesTemplate() [Inlined code]
                              at jdk.incubator.vector.Float256Vector#reduceLanes() [Inlined code]
                              at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody() [JIT compiled code]
                              at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProduct() [Inlined code]
13.12%        581719        org.apache.lucene.util.hnsw.NeighborArray#ramBytesUsed() [Inlined code]
                              at org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.OnHeapHnswGraph#addNode() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
9.51%         421729        org.apache.lucene.index.FloatVectorValues$1#vectorValue() [Inlined code]
                              at org.apache.lucene.codecs.hnsw.DefaultFlatVectorScorer$FloatScoringSupplier$1#score() [Inlined code]
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
8.63%         382600        java.util.Arrays#fill() [Inlined code]
                              at org.apache.lucene.util.FixedBitSet#clear() [Inlined code]
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#prepareScratchState() [Inlined code]
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
4.55%         201882        org.apache.lucene.util.FixedBitSet#getAndSet() [Inlined code]
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() [Inlined code]
3.12%         138341        org.apache.lucene.util.RamUsageEstimator#sizeOf() [Inlined code]
                              at org.apache.lucene.internal.hppc.MaxSizedIntArrayList#ramBytesUsed() [Inlined code]
                              at org.apache.lucene.util.hnsw.NeighborArray#ramBytesUsed() [Inlined code]
                              at org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT compiled code]
2.60%         115254        org.apache.lucene.util.hnsw.HnswConcurrentMergeBuilder$MergeSearcher#graphSeek() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() [Inlined code]
2.42%         107338        org.apache.lucene.internal.hppc.MaxSizedIntArrayList#ramBytesUsed() [Inlined code]
                              at org.apache.lucene.util.hnsw.NeighborArray#ramBytesUsed() [Inlined code]
                              at org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.OnHeapHnswGraph#addNode() [JIT compiled code]
2.36%         104842        org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProductBody() [JIT compiled code]
                              at org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport#dotProduct() [Inlined code]
                              at org.apache.lucene.util.VectorUtil#dotProduct() [Inlined code]
                              at org.apache.lucene.index.VectorSimilarityFunction$2#compare() [Inlined code]
2.36%         104836        org.apache.lucene.util.hnsw.OnHeapHnswGraph#updateGraphRamBytesUsed() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.OnHeapHnswGraph#addNode() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode() [Inlined code]
2.21%         97814         org.apache.lucene.util.hnsw.OnHeapHnswGraph#getNeighbors() [Inlined code]
                              at org.apache.lucene.util.hnsw.HnswConcurrentMergeBuilder$MergeSearcher#graphSeek() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.HnswGraphSearcher#searchLevel() [JIT compiled code]
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNodeInternal() [JIT compiled code]
1.55%         68929         java.util.concurrent.locks.AbstractQueuedSynchronizer#apparentlyFirstQueuedIsExclusive() [Inlined code]
                              at java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync#readerShouldBlock() [Inlined code]
                              at java.util.concurrent.locks.ReentrantReadWriteLock$Sync#tryAcquireShared() [Inlined code]
                              at java.util.concurrent.locks.AbstractQueuedSynchronizer#acquireShared() [Inlined code]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions