Conversation
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
jpountz
left a comment
There was a problem hiding this comment.
Nice speedup! Term queries are fast, though a term query on the is one of the slowest queries in the Tantivy benchmark, so it's nice to get it optimized.
| public int score(LeafCollector collector, Bits acceptDocs, int min, int max) throws IOException { | ||
| if (collector.competitiveIterator() != null) { | ||
| return new Weight.DefaultBulkScorer(scorer).score(collector, acceptDocs, min, max); | ||
| } |
There was a problem hiding this comment.
I wonder if this should be an implementation detail of DefaultBulkScorer instead of a different class. Doing something like
if (scoreMode == TOP_SCORES && competitiveIterator == null) {
// new optimization
} else {
// existing DefaultBulkScorer code
}
There was a problem hiding this comment.
Thanks for feedback! I moved the impl into DefaultBulkScorer.
if (scoreMode == TOP_SCORES && competitiveIterator == null)
As description showing, exhaustive execution get optimized as well so i use scoreMode.needsScores instead.
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
jpountz
left a comment
There was a problem hiding this comment.
Sorry about my last suggestion, I had missed that DefaultBulkScorer had no way to know if scores are needed or not yet, so I think I like your previous approach a bit better to keep DefaultBulkScorer clean.
|
|
||
| if (impactsDisi != null) { | ||
| impactsDisi.ensureCompetitive(); | ||
| } |
There was a problem hiding this comment.
I wonder if we should rather put it at the beginning of the below for loop. For instance, imagine that the first block of docs returned only has deleted docs, then it will fetch a new block. It would be good to check if this block is competitive before fetching this new block as well?
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
// nightly-benchmarks-results-changed //
* main: (32 commits) update os.makedirs with pathlib mkdir (apache#14710) Optimize AbstractKnnVectorQuery#createBitSet with intoBitset (apache#14674) Implement #docIDRunEnd() on PostingsEnum. (apache#14693) Speed up TermQuery (apache#14709) Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. (apache#14701) Fix WindowsFS test failure seen on Policeman Jenkins (apache#14706) Use a temporary repository location to download certain ecj versions ("drops") (apache#14703) Add assumption to ignore occasional test failures due to disconnected graphs (apache#14696) Return MatchNoDocsQuery when IndexOrDocValuesQuery::rewrite does not match (apache#14700) Minor access modifier adjustment to a couple of lucene90 backward compat types (apache#14695) Speed up exhaustive evaluation. (apache#14679) Specify and test that IOContext is immutable (apache#14686) deps(java): bump org.gradle.toolchains.foojay-resolver-convention (apache#14691) deps(java): bump org.eclipse.jgit:org.eclipse.jgit (apache#14692) Clean up how the test framework creates asserting scorables. (apache#14452) Make competitive iterators more robust. (apache#14532) Remove DISIDocIdStream. (apache#14550) Implement AssertingPostingsEnum#intoBitSet. (apache#14675) Fix patience knn queries to work with seeded knn queries (apache#14688) Added toString() method to BytesRefBuilder (apache#14676) ...
|
This change yielded a good speedup on nightly benchmarks, I pushed an annotation. https://benchmarks.mikemccandless.com/Term.html |
This tries to speed up
TermQuerywith the new APIScorer#nextDocsAndScoresTopN
Exhaustive