Deferring lambda in TermStates.java according to prefetch by shubhamsrkdev · Pull Request #15627 · apache/lucene

shubhamsrkdev · 2026-01-28T23:25:32Z

Problem

Currently, for term lookup TermStates.get defers for every case (to the lambda in TermStates.java). This works great for cold indexes but might not be the best for hot indexes.

Solution

We can depend upon prefetch for deferring. If prefetch is being done then index is still cold hence we should defer termstate lookup through lambda, if not we should short-circuit and look up termstate directly instead of deferring.

Testing

We added this new task to demonstrate that this changes are beneficial for cases when one term is missing(jasdgiasgdiuygduasgd) and the other term is present in high frequency(ring) and enabled sort order (Thanks @epotyom !):

AndMissingHigh: titledvsort//+jasdgiasgdiuygduasgd +ring

The task shows good improvements in QPS (+31.8%), rest of the tasks are stable:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                           range     3610.60      (6.5%)     3536.68      (5.7%)   -2.0% ( -13% -   10%) 0.286
                       MedPhrase      247.57      (3.9%)      244.81      (4.5%)   -1.1% (  -9% -    7%) 0.399
                      AndHighLow     1632.42      (3.0%)     1617.01      (4.2%)   -0.9% (  -7% -    6%) 0.416
            BrowseDateTaxoFacets        3.20      (5.0%)        3.18      (4.9%)   -0.8% ( -10% -    9%) 0.597
                       OrHighMed      675.71      (1.6%)      670.72      (2.4%)   -0.7% (  -4% -    3%) 0.258
       BrowseDayOfYearTaxoFacets        3.22      (4.9%)        3.20      (4.8%)   -0.7% (  -9% -    9%) 0.631
                          IntSet      774.21      (5.8%)      768.52      (5.3%)   -0.7% ( -11% -   10%) 0.675
               HighTermMonthSort     1457.35      (2.6%)     1447.58      (3.4%)   -0.7% (  -6% -    5%) 0.479
                      OrHighHigh      317.76      (3.8%)      316.28      (5.0%)   -0.5% (  -8% -    8%) 0.738
                    OrNotHighLow     1553.80      (3.7%)     1546.93      (3.3%)   -0.4% (  -7% -    6%) 0.690
                     AndHighHigh      266.04      (2.5%)      265.04      (2.4%)   -0.4% (  -5% -    4%) 0.624
           HighTermDayOfYearSort      334.42      (4.9%)      333.19      (5.0%)   -0.4% (  -9% -   10%) 0.814
            MedTermDayTaxoFacets       35.70      (1.2%)       35.58      (1.3%)   -0.3% (  -2% -    2%) 0.396
                       LowPhrase       78.41      (1.5%)       78.26      (1.0%)   -0.2% (  -2% -    2%) 0.641
                         LowTerm     1770.19      (2.3%)     1767.79      (3.6%)   -0.1% (  -5% -    5%) 0.888
     BrowseRandomLabelTaxoFacets        2.41      (2.0%)        2.40      (1.5%)   -0.1% (  -3% -    3%) 0.912
         AndHighMedDayTaxoFacets       88.80      (0.8%)       88.75      (1.0%)   -0.0% (  -1% -    1%) 0.859
                     LowSpanNear       40.39      (2.1%)       40.40      (1.3%)    0.0% (  -3% -    3%) 0.943
                          IntNRQ      347.74      (1.4%)      347.98      (1.8%)    0.1% (  -3% -    3%) 0.893
            HighTermTitleBDVSort       53.80      (1.2%)       53.86      (1.1%)    0.1% (  -2% -    2%) 0.752
                    OrNotHighMed      560.70      (3.8%)      561.73      (2.4%)    0.2% (  -5% -    6%) 0.856
                      AndHighMed      705.30      (1.9%)      707.15      (1.8%)    0.3% (  -3% -    4%) 0.652
                    HighSpanNear       29.65      (3.2%)       29.74      (2.8%)    0.3% (  -5% -    6%) 0.752
                       OrHighLow      888.06      (2.3%)      890.92      (2.2%)    0.3% (  -4% -    4%) 0.652
                     MedSpanNear       70.96      (2.2%)       71.19      (1.9%)    0.3% (  -3% -    4%) 0.613
                         MedTerm     1171.64      (3.3%)     1175.76      (3.4%)    0.4% (  -6% -    7%) 0.741
                    OrHighNotLow      997.56      (4.8%)     1001.22      (3.9%)    0.4% (  -7% -    9%) 0.790
                      HighPhrase       52.03      (2.2%)       52.22      (1.4%)    0.4% (  -3% -    4%) 0.521
                         Respell       38.32      (2.3%)       38.48      (1.9%)    0.4% (  -3% -    4%) 0.534
                      TermDTSort      303.26      (2.0%)      304.59      (1.7%)    0.4% (  -3% -    4%) 0.458
        AndHighHighDayTaxoFacets       21.18      (1.7%)       21.29      (1.7%)    0.5% (  -2% -    3%) 0.357
          OrHighMedDayTaxoFacets        6.75      (1.7%)        6.78      (1.6%)    0.5% (  -2% -    3%) 0.319
                         Prefix3      998.21      (3.3%)     1004.68      (2.5%)    0.6% (  -5% -    6%) 0.489
            BrowseDateSSDVFacets        0.90      (8.5%)        0.90      (8.5%)    0.7% ( -15% -   19%) 0.802
                        PKLookup      198.47      (3.1%)      200.00      (2.2%)    0.8% (  -4% -    6%) 0.366
               HighTermTitleSort      172.27      (3.2%)      173.61      (2.2%)    0.8% (  -4% -    6%) 0.368
                    OrHighNotMed      648.55      (4.5%)      653.66      (3.0%)    0.8% (  -6% -    8%) 0.519
           BrowseMonthTaxoFacets        2.77      (0.2%)        2.79      (1.8%)    0.8% (  -1% -    2%) 0.045
                HighSloppyPhrase        6.77      (6.6%)        6.83      (7.0%)    0.8% ( -11% -   15%) 0.701
                        HighTerm     1169.75      (4.0%)     1179.75      (4.0%)    0.9% (  -6% -    9%) 0.498
             MedIntervalsOrdered       49.21      (4.0%)       49.65      (3.7%)    0.9% (  -6% -    8%) 0.472
                 LowSloppyPhrase       33.78      (3.6%)       34.09      (3.4%)    0.9% (  -5% -    8%) 0.399
                          Fuzzy2       69.59      (2.4%)       70.25      (2.1%)    1.0% (  -3% -    5%) 0.183
     BrowseRandomLabelSSDVFacets        3.23     (12.7%)        3.26     (10.2%)    1.1% ( -19% -   27%) 0.762
                          Fuzzy1       78.62      (3.0%)       79.53      (2.6%)    1.1% (  -4% -    6%) 0.188
                 MedSloppyPhrase      109.90      (7.2%)      111.54      (6.4%)    1.5% ( -11% -   16%) 0.488
                   OrHighNotHigh      244.30      (7.5%)      248.19      (6.2%)    1.6% ( -11% -   16%) 0.465
            HighIntervalsOrdered       56.91      (8.0%)       57.95      (6.5%)    1.8% ( -11% -   17%) 0.424
             LowIntervalsOrdered      452.78      (7.7%)      463.41      (6.3%)    2.3% ( -10% -   17%) 0.293
       BrowseDayOfYearSSDVFacets        4.30      (7.3%)        4.42      (8.4%)    2.8% ( -12% -   19%) 0.260
                   OrNotHighHigh      511.85      (8.7%)      526.59     (10.0%)    2.9% ( -14% -   23%) 0.331
                        Wildcard       68.34      (5.7%)       71.24      (3.3%)    4.2% (  -4% -   14%) 0.004
           BrowseMonthSSDVFacets        4.45     (13.2%)        4.74     (18.7%)    6.6% ( -22% -   44%) 0.200
                  AndMissingHigh     2158.52      (5.3%)     2844.18      (6.1%)   31.8% (  19% -   45%) 0.000

I have also tried simulating a cold index by using https://github.com/mikemccand/luceneutil/blob/main/src/python/ramhog.c .

shubhamsekdev % free -h                                            
              total        used        free      shared  buff/cache   available
Mem:           247G        232G        8.5G        948K        6.0G         11G
Swap:            0B          0B          0B

The results look fine :

                          TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                           range     2190.34      (4.3%)     2140.19      (6.3%)   -2.3% ( -12% -    8%) 0.179
     BrowseRandomLabelSSDVFacets        3.26     (10.9%)        3.21      (8.7%)   -1.6% ( -19% -   20%) 0.609
                      AndHighMed      829.92      (1.5%)      819.00      (2.3%)   -1.3% (  -5% -    2%) 0.032
                        HighTerm      833.25      (4.3%)      822.93      (5.6%)   -1.2% ( -10% -    9%) 0.435
                    OrHighNotMed      656.71      (5.5%)      649.27      (5.2%)   -1.1% ( -11% -   10%) 0.502
                    OrHighNotLow     1044.04      (4.1%)     1032.56      (4.8%)   -1.1% (  -9% -    8%) 0.436
                          Fuzzy1       90.66      (1.5%)       89.83      (2.1%)   -0.9% (  -4% -    2%) 0.115
                       OrHighLow     1223.62      (2.3%)     1212.71      (1.9%)   -0.9% (  -4% -    3%) 0.178
                 LowSloppyPhrase       27.00      (2.0%)       26.82      (4.3%)   -0.7% (  -6% -    5%) 0.523
                       OrHighMed      730.32      (1.4%)      726.01      (1.6%)   -0.6% (  -3% -    2%) 0.209
                          Fuzzy2       69.22      (1.3%)       68.86      (1.7%)   -0.5% (  -3% -    2%) 0.288
                 MedSloppyPhrase       68.68      (1.7%)       68.34      (2.5%)   -0.5% (  -4% -    3%) 0.469
                HighSloppyPhrase       28.17      (2.7%)       28.05      (3.0%)   -0.4% (  -5% -    5%) 0.634
                         MedTerm     1234.74      (4.5%)     1229.69      (4.6%)   -0.4% (  -9% -    9%) 0.777
                     LowSpanNear       45.21      (2.3%)       45.05      (2.4%)   -0.4% (  -4% -    4%) 0.629
           BrowseMonthTaxoFacets        2.79      (2.1%)        2.78      (0.9%)   -0.3% (  -3% -    2%) 0.614
        AndHighHighDayTaxoFacets       26.91      (1.5%)       26.86      (1.5%)   -0.2% (  -3% -    2%) 0.695
                      HighPhrase      293.10      (1.9%)      292.56      (1.7%)   -0.2% (  -3% -    3%) 0.747
                     MedSpanNear       39.04      (1.4%)       38.97      (1.7%)   -0.2% (  -3% -    2%) 0.720
                      AndHighLow     1517.90      (2.9%)     1515.76      (2.6%)   -0.1% (  -5% -    5%) 0.871
            BrowseDateSSDVFacets        0.90      (8.5%)        0.90      (8.6%)   -0.1% ( -15% -   18%) 0.959
                        Wildcard      372.32      (4.3%)      371.81      (3.1%)   -0.1% (  -7% -    7%) 0.907
                    HighSpanNear       42.36      (1.7%)       42.31      (1.8%)   -0.1% (  -3% -    3%) 0.822
                         LowTerm     1730.51      (3.6%)     1728.87      (2.8%)   -0.1% (  -6% -    6%) 0.926
            HighTermTitleBDVSort       36.39      (1.5%)       36.38      (1.9%)   -0.0% (  -3% -    3%) 0.965
                    OrNotHighLow     1343.68      (3.7%)     1344.38      (3.1%)    0.1% (  -6% -    7%) 0.961
                       MedPhrase      394.21      (1.1%)      394.43      (0.8%)    0.1% (  -1% -    1%) 0.853
               HighTermTitleSort      167.15      (2.4%)      167.27      (2.4%)    0.1% (  -4% -    4%) 0.926
                       LowPhrase      179.37      (1.2%)      179.50      (1.3%)    0.1% (  -2% -    2%) 0.849
                         Respell       39.85      (1.4%)       39.88      (1.3%)    0.1% (  -2% -    2%) 0.862
                      OrHighHigh      164.00      (4.9%)      164.14      (3.3%)    0.1% (  -7% -    8%) 0.950
           HighTermDayOfYearSort      338.43      (2.1%)      338.83      (2.6%)    0.1% (  -4% -    4%) 0.876
            MedTermDayTaxoFacets       39.30      (1.0%)       39.36      (1.5%)    0.2% (  -2% -    2%) 0.701
                          IntSet      847.58      (5.1%)      849.31      (4.9%)    0.2% (  -9% -   10%) 0.898
            BrowseDateTaxoFacets        3.18      (5.0%)        3.19      (8.6%)    0.3% ( -12% -   14%) 0.909
         AndHighMedDayTaxoFacets       90.92      (0.8%)       91.18      (1.0%)    0.3% (  -1% -    2%) 0.298
       BrowseDayOfYearTaxoFacets        3.21      (4.9%)        3.22      (8.1%)    0.3% ( -12% -   13%) 0.889
                         Prefix3      528.21      (3.6%)      530.75      (3.5%)    0.5% (  -6% -    7%) 0.666
          OrHighMedDayTaxoFacets        7.68      (2.3%)        7.72      (1.3%)    0.5% (  -2% -    4%) 0.399
                          IntNRQ      452.72      (1.6%)      454.94      (1.5%)    0.5% (  -2% -    3%) 0.330
                   OrHighNotHigh      521.22      (4.6%)      524.07      (4.6%)    0.5% (  -8% -   10%) 0.707
                     AndHighHigh      215.59      (6.6%)      217.01      (5.0%)    0.7% ( -10% -   13%) 0.723
                      TermDTSort      279.76      (5.7%)      282.28      (5.7%)    0.9% (  -9% -   12%) 0.615
     BrowseRandomLabelTaxoFacets        2.40      (1.5%)        2.42      (6.5%)    1.1% (  -6% -    9%) 0.477
           BrowseMonthSSDVFacets        4.55     (12.1%)        4.60     (12.9%)    1.1% ( -21% -   29%) 0.781
                   OrNotHighHigh      386.83      (5.0%)      391.18      (4.7%)    1.1% (  -8% -   11%) 0.466
       BrowseDayOfYearSSDVFacets        4.45      (7.5%)        4.51      (7.8%)    1.2% ( -13% -   17%) 0.619
               HighTermMonthSort     1430.45      (2.6%)     1449.30      (2.5%)    1.3% (  -3% -    6%) 0.107
             LowIntervalsOrdered      374.77      (6.1%)      380.25      (6.3%)    1.5% ( -10% -   14%) 0.453
            HighIntervalsOrdered       24.88      (4.7%)       25.25      (4.7%)    1.5% (  -7% -   11%) 0.320
             MedIntervalsOrdered      117.02      (4.8%)      118.89      (4.7%)    1.6% (  -7% -   11%) 0.290
                        PKLookup      197.14      (2.2%)      200.83      (2.4%)    1.9% (  -2% -    6%) 0.011
                    OrNotHighMed      710.06     (13.5%)      729.26      (8.8%)    2.7% ( -17% -   28%) 0.452
                  AndMissingHigh     2187.90      (4.1%)     2911.33      (6.4%)   33.1% (  21% -   45%) 0.000

…boolean and propagate results to TermStates to remove lambda for hot index optimization. Issue: apache#15515

epotyom · 2026-01-29T15:05:35Z

+31% QPS - nice improvement for hot index scenario! I don't think we should see cold index improvements though - is the index cold enough 🥶?

I'm curious if we want to add the task to luceneutil permanently, as it looks like no other task exercises needScores = false code path for boolean queries?

lucene/core/src/java/org/apache/lucene/index/TermStates.java

lucene/core/src/java/org/apache/lucene/util/IOBooleanSupplier.java

lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java

# Conflicts: # lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/SegmentTermsEnum.java

shubhamsrkdev · 2026-01-30T19:33:26Z

+31% QPS - nice improvement for hot index scenario! I don't think we should see cold index improvements though - is the index cold enough 🥶?

Hmm I have tried with even lower memory (4 GB) with same results.

I'm curious if we want to add the task to luceneutil permanently, as it looks like no other task exercises needScores = false code path for boolean queries?

Makes sense!

github-actions · 2026-02-14T00:35:12Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

shubhamsrkdev · 2026-02-23T16:24:51Z

We merged a PR In luceneutil which adds the task to measure perf improvement from this PR.

shubhamsrkdev · 2026-02-25T13:51:56Z

We have the first data point from running AndMissingHigh in nightlies here:

This would be a good baseline to judge when this change is merged!

github-actions · 2026-03-12T00:30:25Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

epotyom and others added 6 commits January 27, 2026 22:46

Create MemorySegmentIndexInput#isProbablyLoaded, change #prefetch to …

03920fa

…boolean and propagate results to TermStates to remove lambda for hot index optimization. Issue: apache#15515

Removed duplicate code

6767745

Removed hotCounter

bd13547

Removed isProbablyLoaded

f58822a

Refactoring

1d1ea33

Added prefetch back

ade5c41

github-actions bot added module:core/store module:core/index module:core/codecs module:test-framework labels Jan 28, 2026

shubhamsrkdev marked this pull request as ready for review January 29, 2026 11:12

Added CHANGES.txt entry

68f1ba1

github-actions bot added this to the 11.0.0 milestone Jan 29, 2026

epotyom reviewed Jan 29, 2026

View reviewed changes

lucene/core/src/java/org/apache/lucene/index/TermStates.java Outdated Show resolved Hide resolved

lucene/core/src/java/org/apache/lucene/index/TermStates.java Outdated Show resolved Hide resolved

lucene/core/src/java/org/apache/lucene/index/TermStates.java Outdated Show resolved Hide resolved

uschindler reviewed Jan 29, 2026

View reviewed changes

lucene/core/src/java/org/apache/lucene/util/IOBooleanSupplier.java Show resolved Hide resolved

uschindler reviewed Jan 29, 2026

View reviewed changes

lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java Outdated Show resolved Hide resolved

shubhamsrkdev added 4 commits January 30, 2026 17:13

Merge branch 'main' into termStateChange

086da01

# Conflicts: # lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/SegmentTermsEnum.java

Fixed getIOBooleanSupplier fucntion name

7f5185d

Comments

7d00c77

Removed stateSupplier.get()

69ea761

shubhamsrkdev mentioned this pull request Feb 4, 2026

Add task that exercises needScores = false code path for boolean queries mikemccand/luceneutil#519

Closed

github-actions bot added the Stale label Feb 14, 2026

github-actions bot removed the Stale label Feb 24, 2026

github-actions bot added the Stale label Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deferring lambda in TermStates.java according to prefetch#15627

Deferring lambda in TermStates.java according to prefetch#15627
shubhamsrkdev wants to merge 11 commits intoapache:mainfrom
shubhamsrkdev:termStateChange

shubhamsrkdev commented Jan 28, 2026 •

edited

Loading

Uh oh!

epotyom commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shubhamsrkdev commented Jan 30, 2026

Uh oh!

github-actions bot commented Feb 14, 2026

Uh oh!

shubhamsrkdev commented Feb 23, 2026

Uh oh!

shubhamsrkdev commented Feb 25, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shubhamsrkdev commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Uh oh!

epotyom commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shubhamsrkdev commented Jan 30, 2026

Uh oh!

github-actions bot commented Feb 14, 2026

Uh oh!

shubhamsrkdev commented Feb 23, 2026

Uh oh!

shubhamsrkdev commented Feb 25, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shubhamsrkdev commented Jan 28, 2026 •

edited

Loading