Conversation
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
Whoa, thank you for the quick PR @benwtrent! That's disappointing that there's no speedup on local benchmark, odd. I wonder if the nightly profiler (JFR) is somehow giving ghosts ... it should be async (no safepoint bias)? Or maybe it's an env/benchy difference and once we merge this we'll maybe see gains in the nightly benchy. |
|
Hmm, not sure it matters, but the JFR result I linked to in #14763 was from the single-threaded (deterministic, |
mikemccand
left a comment
There was a problem hiding this comment.
I like this incremental approach instead! I was confused on whether we are missing some tracking in that base RAM_BYTES_USED.
I think your reasoning about thread safety is correct (we only track RAM during initial flush, and that is single threaded). Let's at least add comment / javadoc somewhere that explains this precarious thread safety situation? Who knows maybe someday IndexWriter will need to account for RAM consumed by merging...
| // see: https://github.com/apache/lucene/issues/14214 | ||
| // connectComponents(); | ||
| frozen = true; | ||
| hnsw.finishBuild(); |
There was a problem hiding this comment.
@mikemccand its not used. It was only added for the RAM estimate added in the previous PR. Since we are doing incremental updates, I just removed it.
| private static final long RAM_BYTES_USED = | ||
| 4L * Integer.BYTES // all int fields | ||
| + 1 // field: noGrowth | ||
| + RamUsageEstimator.NUM_BYTES_OBJECT_REF |
There was a problem hiding this comment.
Is this for the graph pointer we hold?
There was a problem hiding this comment.
I am switching all of this to a simple RamUsageEstimator.shallowSizeOfInstance(OnHeapHnswGraph.class); attempting to account for fields directly is just too complicated
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java
Outdated
Show resolved
Hide resolved
| IllegalArgumentException.class, () -> HnswGraphBuilder.create(scorerSupplier, 10, 0, 0)); | ||
| } | ||
|
|
||
| @com.carrotsearch.randomizedtesting.annotations.Repeat(iterations = 100) |
There was a problem hiding this comment.
leftover, or intentional ?
There was a problem hiding this comment.
LOL, of course, I want CI to REALLY exercise this test ;)
There was a problem hiding this comment.
you can always override this locally by adding "-Ptests.iters=100".
* Disable HNSW connectedComponents (#14214) * Adjusting OnHeapHnswGraph RAM estimation to be incremental during build * changes * addressing pr comments * removing extra logging
|
@benwtrent Thanks for helping improve |
* Disable HNSW connectedComponents (#14214) * Adjusting OnHeapHnswGraph RAM estimation to be incremental during build * changes * addressing pr comments * removing extra logging
* Disable HNSW connectedComponents (#14214) * Adjusting OnHeapHnswGraph RAM estimation to be incremental during build * changes * addressing pr comments * removing extra logging
This makes OnHeapGraph builder RAM estimates much cheaper by incrementally updating the estimate as nodes are added to the NeighborArray.
Local benchmarking shows almost no performance impact, but I can no longer see any ram estimate methods in the top methods given the CPU samples.
One outstanding bit here is that
OnHeapGraph#graphRamBytesUsedis only volatile. So, its not really threadsafe for estimates. However, from what I can tell, the only time we actually use this is during initial flush, and that is within a single thread anyways, right?Follow on from: #14527
related: #14763