Merged
Conversation
I had initially introduced `DISIDocIdStream` to avoid introducing regressions when `DenseConjunctionBulkScorer` started accepting single clauses. However, benchmarks on apache#14532 suggested that going through `DISIDocIdStream` is slower than loading docs into a bit set first and then iterating the bit set, when the postings list has many of its blocks encoded as bit sets. This makes sense, the way how `BitSetDocIdStream` iterates set bits saves a number of operations compared with calling `FixedBitSet#nextSetBit` in a loop. So I'm suggesting removing `DISIDocIdStream` for now for simplicity.
gf2121
approved these changes
Apr 24, 2025
Contributor
gf2121
left a comment
There was a problem hiding this comment.
Excited to see a combination of simplification and optimization!
Contributor
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
Contributor
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
jpountz
added a commit
that referenced
this pull request
May 20, 2025
I had initially introduced `DISIDocIdStream` to avoid introducing regressions when `DenseConjunctionBulkScorer` started accepting single clauses. However, benchmarks on #14532 suggested that going through `DISIDocIdStream` is slower than loading docs into a bit set first and then iterating the bit set, when the postings list has many of its blocks encoded as bit sets. This makes sense, the way how `BitSetDocIdStream` iterates set bits saves a number of operations compared with calling `FixedBitSet#nextSetBit` in a loop. So I'm suggesting removing `DISIDocIdStream` for now for simplicity.
weizijun
added a commit
to weizijun/lucene
that referenced
this pull request
May 27, 2025
* main: (32 commits) update os.makedirs with pathlib mkdir (apache#14710) Optimize AbstractKnnVectorQuery#createBitSet with intoBitset (apache#14674) Implement #docIDRunEnd() on PostingsEnum. (apache#14693) Speed up TermQuery (apache#14709) Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. (apache#14701) Fix WindowsFS test failure seen on Policeman Jenkins (apache#14706) Use a temporary repository location to download certain ecj versions ("drops") (apache#14703) Add assumption to ignore occasional test failures due to disconnected graphs (apache#14696) Return MatchNoDocsQuery when IndexOrDocValuesQuery::rewrite does not match (apache#14700) Minor access modifier adjustment to a couple of lucene90 backward compat types (apache#14695) Speed up exhaustive evaluation. (apache#14679) Specify and test that IOContext is immutable (apache#14686) deps(java): bump org.gradle.toolchains.foojay-resolver-convention (apache#14691) deps(java): bump org.eclipse.jgit:org.eclipse.jgit (apache#14692) Clean up how the test framework creates asserting scorables. (apache#14452) Make competitive iterators more robust. (apache#14532) Remove DISIDocIdStream. (apache#14550) Implement AssertingPostingsEnum#intoBitSet. (apache#14675) Fix patience knn queries to work with seeded knn queries (apache#14688) Added toString() method to BytesRefBuilder (apache#14676) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I had initially introduced
DISIDocIdStreamto avoid introducing regressions whenDenseConjunctionBulkScorerstarted accepting single clauses. However, benchmarks on #14532 suggested that going throughDISIDocIdStreamis slower than loading docs into a bit set first and then iterating the bit set, when the postings list has many of its blocks encoded as bit sets.This makes sense, the way how
BitSetDocIdStreamiterates set bits saves a number of operations compared with callingFixedBitSet#nextSetBitin a loop.So I'm suggesting removing
DISIDocIdStreamfor now for simplicity.