Conversation
Updates {hnsw and hnsw-sqv} x {onnx and cached} for both BEIR and MS MARCO: code updated, parameters re-tuned.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 98022af436
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| public KnnVectorsFormat getKnnVectorsFormatForField(String field) { | ||
| return new DelegatingKnnVectorsFormat(new Lucene99HnswVectorsFormat(args.M, args.efC), 4096); | ||
| } | ||
| }); | ||
| } | ||
|
|
||
| this.writer = new IndexWriter(dir, config); |
There was a problem hiding this comment.
Reapply writer settings before creating HNSW IndexWriter
The constructor now instantiates IndexWriter immediately after codec selection, but no longer sets OpenMode.CREATE or setRAMBufferSizeMB(args.memoryBuffer) on IndexWriterConfig. As a result, rerunning indexing against an existing -index path can append to prior data instead of recreating the index, and the -memoryBuffer CLI argument is silently ignored. This changes indexing semantics and can invalidate reproducibility for dense-vector experiments.
Useful? React with 👍 / 👎.
| rs = searcher.search(query, useRM3 ? searchArgs.rerankcutoff : k, BREAK_SCORE_TIES_BY_DOCID, true); | ||
| } | ||
| context = new RerankerContext<>(searcher, null, query, null, | ||
| TopDocs rs = searcher.search(query, useRM3 ? searchArgs.rerankcutoff : k, BREAK_SCORE_TIES_BY_DOCID, true); |
There was a problem hiding this comment.
Keep Lucene8 compatibility branch in feedback-term search
_search still uses backwardsCompatibilityLucene8 to avoid sorted tie-breaking on old indexes, but _get_feedback_terms was changed to always call search(..., BREAK_SCORE_TIES_BY_DOCID, true). On Lucene 8 indexes, this reintroduces the docvalues-based incompatibility that the compatibility flag is meant to bypass, so feedback-term extraction can fail even when normal search() still works.
Useful? React with 👍 / 👎.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3151 +/- ##
============================================
- Coverage 74.14% 73.39% -0.75%
- Complexity 1653 1702 +49
============================================
Files 196 199 +3
Lines 13117 13403 +286
Branches 1708 1765 +57
============================================
+ Hits 9725 9837 +112
- Misses 2691 2845 +154
- Partials 701 721 +20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The branch
lucene10is a long-lived feature branch for upgrading to Lucene 10. The idea is that all other Lucene 10 features would be PRed against this branch, and when we're ready, we can merge this to master in one go.That means we should keep this branch always mergeable to master.
WIP, will keep in draft form until we're ready to merge.