Skip to content

Deferring lambda in TermStates.java according to prefetch#15627

Open
shubhamsrkdev wants to merge 11 commits intoapache:mainfrom
shubhamsrkdev:termStateChange
Open

Deferring lambda in TermStates.java according to prefetch#15627
shubhamsrkdev wants to merge 11 commits intoapache:mainfrom
shubhamsrkdev:termStateChange

Conversation

@shubhamsrkdev
Copy link
Contributor

@shubhamsrkdev shubhamsrkdev commented Jan 28, 2026

Problem

Currently, for term lookup TermStates.get defers for every case (to the lambda in TermStates.java). This works great for cold indexes but might not be the best for hot indexes.

Solution

We can depend upon prefetch for deferring. If prefetch is being done then index is still cold hence we should defer termstate lookup through lambda, if not we should short-circuit and look up termstate directly instead of deferring.

Testing

We added this new task to demonstrate that this changes are beneficial for cases when one term is missing(jasdgiasgdiuygduasgd) and the other term is present in high frequency(ring) and enabled sort order (Thanks @epotyom !):

AndMissingHigh: titledvsort//+jasdgiasgdiuygduasgd +ring

The task shows good improvements in QPS (+31.8%), rest of the tasks are stable:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                           range     3610.60      (6.5%)     3536.68      (5.7%)   -2.0% ( -13% -   10%) 0.286
                       MedPhrase      247.57      (3.9%)      244.81      (4.5%)   -1.1% (  -9% -    7%) 0.399
                      AndHighLow     1632.42      (3.0%)     1617.01      (4.2%)   -0.9% (  -7% -    6%) 0.416
            BrowseDateTaxoFacets        3.20      (5.0%)        3.18      (4.9%)   -0.8% ( -10% -    9%) 0.597
                       OrHighMed      675.71      (1.6%)      670.72      (2.4%)   -0.7% (  -4% -    3%) 0.258
       BrowseDayOfYearTaxoFacets        3.22      (4.9%)        3.20      (4.8%)   -0.7% (  -9% -    9%) 0.631
                          IntSet      774.21      (5.8%)      768.52      (5.3%)   -0.7% ( -11% -   10%) 0.675
               HighTermMonthSort     1457.35      (2.6%)     1447.58      (3.4%)   -0.7% (  -6% -    5%) 0.479
                      OrHighHigh      317.76      (3.8%)      316.28      (5.0%)   -0.5% (  -8% -    8%) 0.738
                    OrNotHighLow     1553.80      (3.7%)     1546.93      (3.3%)   -0.4% (  -7% -    6%) 0.690
                     AndHighHigh      266.04      (2.5%)      265.04      (2.4%)   -0.4% (  -5% -    4%) 0.624
           HighTermDayOfYearSort      334.42      (4.9%)      333.19      (5.0%)   -0.4% (  -9% -   10%) 0.814
            MedTermDayTaxoFacets       35.70      (1.2%)       35.58      (1.3%)   -0.3% (  -2% -    2%) 0.396
                       LowPhrase       78.41      (1.5%)       78.26      (1.0%)   -0.2% (  -2% -    2%) 0.641
                         LowTerm     1770.19      (2.3%)     1767.79      (3.6%)   -0.1% (  -5% -    5%) 0.888
     BrowseRandomLabelTaxoFacets        2.41      (2.0%)        2.40      (1.5%)   -0.1% (  -3% -    3%) 0.912
         AndHighMedDayTaxoFacets       88.80      (0.8%)       88.75      (1.0%)   -0.0% (  -1% -    1%) 0.859
                     LowSpanNear       40.39      (2.1%)       40.40      (1.3%)    0.0% (  -3% -    3%) 0.943
                          IntNRQ      347.74      (1.4%)      347.98      (1.8%)    0.1% (  -3% -    3%) 0.893
            HighTermTitleBDVSort       53.80      (1.2%)       53.86      (1.1%)    0.1% (  -2% -    2%) 0.752
                    OrNotHighMed      560.70      (3.8%)      561.73      (2.4%)    0.2% (  -5% -    6%) 0.856
                      AndHighMed      705.30      (1.9%)      707.15      (1.8%)    0.3% (  -3% -    4%) 0.652
                    HighSpanNear       29.65      (3.2%)       29.74      (2.8%)    0.3% (  -5% -    6%) 0.752
                       OrHighLow      888.06      (2.3%)      890.92      (2.2%)    0.3% (  -4% -    4%) 0.652
                     MedSpanNear       70.96      (2.2%)       71.19      (1.9%)    0.3% (  -3% -    4%) 0.613
                         MedTerm     1171.64      (3.3%)     1175.76      (3.4%)    0.4% (  -6% -    7%) 0.741
                    OrHighNotLow      997.56      (4.8%)     1001.22      (3.9%)    0.4% (  -7% -    9%) 0.790
                      HighPhrase       52.03      (2.2%)       52.22      (1.4%)    0.4% (  -3% -    4%) 0.521
                         Respell       38.32      (2.3%)       38.48      (1.9%)    0.4% (  -3% -    4%) 0.534
                      TermDTSort      303.26      (2.0%)      304.59      (1.7%)    0.4% (  -3% -    4%) 0.458
        AndHighHighDayTaxoFacets       21.18      (1.7%)       21.29      (1.7%)    0.5% (  -2% -    3%) 0.357
          OrHighMedDayTaxoFacets        6.75      (1.7%)        6.78      (1.6%)    0.5% (  -2% -    3%) 0.319
                         Prefix3      998.21      (3.3%)     1004.68      (2.5%)    0.6% (  -5% -    6%) 0.489
            BrowseDateSSDVFacets        0.90      (8.5%)        0.90      (8.5%)    0.7% ( -15% -   19%) 0.802
                        PKLookup      198.47      (3.1%)      200.00      (2.2%)    0.8% (  -4% -    6%) 0.366
               HighTermTitleSort      172.27      (3.2%)      173.61      (2.2%)    0.8% (  -4% -    6%) 0.368
                    OrHighNotMed      648.55      (4.5%)      653.66      (3.0%)    0.8% (  -6% -    8%) 0.519
           BrowseMonthTaxoFacets        2.77      (0.2%)        2.79      (1.8%)    0.8% (  -1% -    2%) 0.045
                HighSloppyPhrase        6.77      (6.6%)        6.83      (7.0%)    0.8% ( -11% -   15%) 0.701
                        HighTerm     1169.75      (4.0%)     1179.75      (4.0%)    0.9% (  -6% -    9%) 0.498
             MedIntervalsOrdered       49.21      (4.0%)       49.65      (3.7%)    0.9% (  -6% -    8%) 0.472
                 LowSloppyPhrase       33.78      (3.6%)       34.09      (3.4%)    0.9% (  -5% -    8%) 0.399
                          Fuzzy2       69.59      (2.4%)       70.25      (2.1%)    1.0% (  -3% -    5%) 0.183
     BrowseRandomLabelSSDVFacets        3.23     (12.7%)        3.26     (10.2%)    1.1% ( -19% -   27%) 0.762
                          Fuzzy1       78.62      (3.0%)       79.53      (2.6%)    1.1% (  -4% -    6%) 0.188
                 MedSloppyPhrase      109.90      (7.2%)      111.54      (6.4%)    1.5% ( -11% -   16%) 0.488
                   OrHighNotHigh      244.30      (7.5%)      248.19      (6.2%)    1.6% ( -11% -   16%) 0.465
            HighIntervalsOrdered       56.91      (8.0%)       57.95      (6.5%)    1.8% ( -11% -   17%) 0.424
             LowIntervalsOrdered      452.78      (7.7%)      463.41      (6.3%)    2.3% ( -10% -   17%) 0.293
       BrowseDayOfYearSSDVFacets        4.30      (7.3%)        4.42      (8.4%)    2.8% ( -12% -   19%) 0.260
                   OrNotHighHigh      511.85      (8.7%)      526.59     (10.0%)    2.9% ( -14% -   23%) 0.331
                        Wildcard       68.34      (5.7%)       71.24      (3.3%)    4.2% (  -4% -   14%) 0.004
           BrowseMonthSSDVFacets        4.45     (13.2%)        4.74     (18.7%)    6.6% ( -22% -   44%) 0.200
                  AndMissingHigh     2158.52      (5.3%)     2844.18      (6.1%)   31.8% (  19% -   45%) 0.000

I have also tried simulating a cold index by using https://github.com/mikemccand/luceneutil/blob/main/src/python/ramhog.c .

shubhamsekdev % free -h                                            
              total        used        free      shared  buff/cache   available
Mem:           247G        232G        8.5G        948K        6.0G         11G
Swap:            0B          0B          0B

The results look fine :

                          TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                           range     2190.34      (4.3%)     2140.19      (6.3%)   -2.3% ( -12% -    8%) 0.179
     BrowseRandomLabelSSDVFacets        3.26     (10.9%)        3.21      (8.7%)   -1.6% ( -19% -   20%) 0.609
                      AndHighMed      829.92      (1.5%)      819.00      (2.3%)   -1.3% (  -5% -    2%) 0.032
                        HighTerm      833.25      (4.3%)      822.93      (5.6%)   -1.2% ( -10% -    9%) 0.435
                    OrHighNotMed      656.71      (5.5%)      649.27      (5.2%)   -1.1% ( -11% -   10%) 0.502
                    OrHighNotLow     1044.04      (4.1%)     1032.56      (4.8%)   -1.1% (  -9% -    8%) 0.436
                          Fuzzy1       90.66      (1.5%)       89.83      (2.1%)   -0.9% (  -4% -    2%) 0.115
                       OrHighLow     1223.62      (2.3%)     1212.71      (1.9%)   -0.9% (  -4% -    3%) 0.178
                 LowSloppyPhrase       27.00      (2.0%)       26.82      (4.3%)   -0.7% (  -6% -    5%) 0.523
                       OrHighMed      730.32      (1.4%)      726.01      (1.6%)   -0.6% (  -3% -    2%) 0.209
                          Fuzzy2       69.22      (1.3%)       68.86      (1.7%)   -0.5% (  -3% -    2%) 0.288
                 MedSloppyPhrase       68.68      (1.7%)       68.34      (2.5%)   -0.5% (  -4% -    3%) 0.469
                HighSloppyPhrase       28.17      (2.7%)       28.05      (3.0%)   -0.4% (  -5% -    5%) 0.634
                         MedTerm     1234.74      (4.5%)     1229.69      (4.6%)   -0.4% (  -9% -    9%) 0.777
                     LowSpanNear       45.21      (2.3%)       45.05      (2.4%)   -0.4% (  -4% -    4%) 0.629
           BrowseMonthTaxoFacets        2.79      (2.1%)        2.78      (0.9%)   -0.3% (  -3% -    2%) 0.614
        AndHighHighDayTaxoFacets       26.91      (1.5%)       26.86      (1.5%)   -0.2% (  -3% -    2%) 0.695
                      HighPhrase      293.10      (1.9%)      292.56      (1.7%)   -0.2% (  -3% -    3%) 0.747
                     MedSpanNear       39.04      (1.4%)       38.97      (1.7%)   -0.2% (  -3% -    2%) 0.720
                      AndHighLow     1517.90      (2.9%)     1515.76      (2.6%)   -0.1% (  -5% -    5%) 0.871
            BrowseDateSSDVFacets        0.90      (8.5%)        0.90      (8.6%)   -0.1% ( -15% -   18%) 0.959
                        Wildcard      372.32      (4.3%)      371.81      (3.1%)   -0.1% (  -7% -    7%) 0.907
                    HighSpanNear       42.36      (1.7%)       42.31      (1.8%)   -0.1% (  -3% -    3%) 0.822
                         LowTerm     1730.51      (3.6%)     1728.87      (2.8%)   -0.1% (  -6% -    6%) 0.926
            HighTermTitleBDVSort       36.39      (1.5%)       36.38      (1.9%)   -0.0% (  -3% -    3%) 0.965
                    OrNotHighLow     1343.68      (3.7%)     1344.38      (3.1%)    0.1% (  -6% -    7%) 0.961
                       MedPhrase      394.21      (1.1%)      394.43      (0.8%)    0.1% (  -1% -    1%) 0.853
               HighTermTitleSort      167.15      (2.4%)      167.27      (2.4%)    0.1% (  -4% -    4%) 0.926
                       LowPhrase      179.37      (1.2%)      179.50      (1.3%)    0.1% (  -2% -    2%) 0.849
                         Respell       39.85      (1.4%)       39.88      (1.3%)    0.1% (  -2% -    2%) 0.862
                      OrHighHigh      164.00      (4.9%)      164.14      (3.3%)    0.1% (  -7% -    8%) 0.950
           HighTermDayOfYearSort      338.43      (2.1%)      338.83      (2.6%)    0.1% (  -4% -    4%) 0.876
            MedTermDayTaxoFacets       39.30      (1.0%)       39.36      (1.5%)    0.2% (  -2% -    2%) 0.701
                          IntSet      847.58      (5.1%)      849.31      (4.9%)    0.2% (  -9% -   10%) 0.898
            BrowseDateTaxoFacets        3.18      (5.0%)        3.19      (8.6%)    0.3% ( -12% -   14%) 0.909
         AndHighMedDayTaxoFacets       90.92      (0.8%)       91.18      (1.0%)    0.3% (  -1% -    2%) 0.298
       BrowseDayOfYearTaxoFacets        3.21      (4.9%)        3.22      (8.1%)    0.3% ( -12% -   13%) 0.889
                         Prefix3      528.21      (3.6%)      530.75      (3.5%)    0.5% (  -6% -    7%) 0.666
          OrHighMedDayTaxoFacets        7.68      (2.3%)        7.72      (1.3%)    0.5% (  -2% -    4%) 0.399
                          IntNRQ      452.72      (1.6%)      454.94      (1.5%)    0.5% (  -2% -    3%) 0.330
                   OrHighNotHigh      521.22      (4.6%)      524.07      (4.6%)    0.5% (  -8% -   10%) 0.707
                     AndHighHigh      215.59      (6.6%)      217.01      (5.0%)    0.7% ( -10% -   13%) 0.723
                      TermDTSort      279.76      (5.7%)      282.28      (5.7%)    0.9% (  -9% -   12%) 0.615
     BrowseRandomLabelTaxoFacets        2.40      (1.5%)        2.42      (6.5%)    1.1% (  -6% -    9%) 0.477
           BrowseMonthSSDVFacets        4.55     (12.1%)        4.60     (12.9%)    1.1% ( -21% -   29%) 0.781
                   OrNotHighHigh      386.83      (5.0%)      391.18      (4.7%)    1.1% (  -8% -   11%) 0.466
       BrowseDayOfYearSSDVFacets        4.45      (7.5%)        4.51      (7.8%)    1.2% ( -13% -   17%) 0.619
               HighTermMonthSort     1430.45      (2.6%)     1449.30      (2.5%)    1.3% (  -3% -    6%) 0.107
             LowIntervalsOrdered      374.77      (6.1%)      380.25      (6.3%)    1.5% ( -10% -   14%) 0.453
            HighIntervalsOrdered       24.88      (4.7%)       25.25      (4.7%)    1.5% (  -7% -   11%) 0.320
             MedIntervalsOrdered      117.02      (4.8%)      118.89      (4.7%)    1.6% (  -7% -   11%) 0.290
                        PKLookup      197.14      (2.2%)      200.83      (2.4%)    1.9% (  -2% -    6%) 0.011
                    OrNotHighMed      710.06     (13.5%)      729.26      (8.8%)    2.7% ( -17% -   28%) 0.452
                  AndMissingHigh     2187.90      (4.1%)     2911.33      (6.4%)   33.1% (  21% -   45%) 0.000

@github-actions github-actions bot added this to the 11.0.0 milestone Jan 29, 2026
@epotyom
Copy link
Contributor

epotyom commented Jan 29, 2026

+31% QPS - nice improvement for hot index scenario! I don't think we should see cold index improvements though - is the index cold enough 🥶?

I'm curious if we want to add the task to luceneutil permanently, as it looks like no other task exercises needScores = false code path for boolean queries?

# Conflicts:
#	lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/SegmentTermsEnum.java
@shubhamsrkdev
Copy link
Contributor Author

+31% QPS - nice improvement for hot index scenario! I don't think we should see cold index improvements though - is the index cold enough 🥶?

Hmm I have tried with even lower memory (4 GB) with same results.

I'm curious if we want to add the task to luceneutil permanently, as it looks like no other task exercises needScores = false code path for boolean queries?

Makes sense!

@github-actions
Copy link
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Feb 14, 2026
@shubhamsrkdev
Copy link
Contributor Author

We merged a PR In luceneutil which adds the task to measure perf improvement from this PR.

@github-actions github-actions bot removed the Stale label Feb 24, 2026
@shubhamsrkdev
Copy link
Contributor Author

We have the first data point from running AndMissingHigh in nightlies here:

image

This would be a good baseline to judge when this change is merged!

@github-actions
Copy link
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants