Deferring lambda in TermStates.java according to prefetch#15627
Deferring lambda in TermStates.java according to prefetch#15627shubhamsrkdev wants to merge 11 commits intoapache:mainfrom
Conversation
…boolean and propagate results to TermStates to remove lambda for hot index optimization. Issue: apache#15515
|
+31% QPS - nice improvement for hot index scenario! I don't think we should see cold index improvements though - is the index cold enough 🥶? I'm curious if we want to add the task to luceneutil permanently, as it looks like no other task exercises needScores = false code path for boolean queries? |
lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java
Outdated
Show resolved
Hide resolved
# Conflicts: # lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/SegmentTermsEnum.java
Hmm I have tried with even lower memory (4 GB) with same results.
Makes sense! |
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
|
We merged a PR In luceneutil which adds the task to measure perf improvement from this PR. |
|
We have the first data point from running AndMissingHigh in nightlies here:
This would be a good baseline to judge when this change is merged! |
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |

Problem
Currently, for term lookup
TermStates.getdefers for every case (to the lambda in TermStates.java). This works great for cold indexes but might not be the best for hot indexes.Solution
We can depend upon prefetch for deferring. If prefetch is being done then index is still cold hence we should defer termstate lookup through lambda, if not we should short-circuit and look up termstate directly instead of deferring.
Testing
We added this new task to demonstrate that this changes are beneficial for cases when one term is missing(
jasdgiasgdiuygduasgd) and the other term is present in high frequency(ring) and enabled sort order (Thanks @epotyom !):AndMissingHigh: titledvsort//+jasdgiasgdiuygduasgd +ringThe task shows good improvements in QPS (+31.8%), rest of the tasks are stable:
I have also tried simulating a cold index by using https://github.com/mikemccand/luceneutil/blob/main/src/python/ramhog.c .
The results look fine :