Added batch equivalent of computeQueryDocumentScore#1882
Added batch equivalent of computeQueryDocumentScore#1882HAKSOAT wants to merge 4 commits intocastorini:masterfrom
Conversation
Codecov Report
Continue to review full report at Codecov.
|
|
Hi @lintool I'd appreciate a review here when chanced. |
|
|
||
| Query filterQuery = new ConstantScoreQuery(new TermQuery(new Term(IndexArgs.ID, docid))); | ||
| BooleanQuery.Builder builder = new BooleanQuery.Builder(); | ||
| builder.add(filterQuery, BooleanClause.Occur.MUST); |
There was a problem hiding this comment.
What you want to do is to move the docids here: In the non-batch impl, the filter clause restricts to a single docid. Here, in the batch impl, you want to restrict to a set of docids - i.e., add multiple sub-clauses in the filter query.
There was a problem hiding this comment.
Hi @lintool I took a look at this and tried testing with a set of documents from robust04. However, I came across the error: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024.
This resulted from doing:
for (String docid: docids){
// Setting default result value for all docids.
results.put(docid, 0.0f);
Query filterQuery = new ConstantScoreQuery(new TermQuery(new Term(IndexArgs.ID, docid)));
builder.add(filterQuery, BooleanClause.Occur.SHOULD);
}
What are your thoughts on this?
Am I doing the right thing? I tried this with tests and it works when the clause count is less than 1024.
| import io.anserini.search.query.BagOfWordsQueryGenerator; | ||
| import io.anserini.search.query.PhraseQueryGenerator; | ||
| import org.apache.logging.log4j.LogManager; | ||
| import org.apache.logging.log4j.Logger; |
There was a problem hiding this comment.
Note to @HAKSOAT: Remove these logger imports if not needed in the final implementation.
Addresses #1484