Faster indexing for learned sparse retrieval #2080
Faster indexing for learned sparse retrieval #2080thongnt99 wants to merge 5 commits intocastorini:masterfrom
Conversation
|
Hi @thongnt99 very interesting and thanks for the PR! Can you provide a sense of the performance improvement? |
|
Hi @lintool , These are some comparison points I collected from our recent reproduction attempt with LSR methods.
|
@thongnt99 this is cool! |
src/main/java/io/anserini/collection/JsonTermWeightCollection.java
Outdated
Show resolved
Hide resolved
src/main/java/io/anserini/index/generator/TermWeightDocumentGenerator.java
Show resolved
Hide resolved
|
Instead of |
Yes, I also think that TermWeightDocument isn't an ideal name. Probably |
|
I like |
|
@lintool |
@lintool I am gonna add the tests after ECIR. |
Related to #1890
On-going work: Using FeatureField to directly index terms and weights
The indexing works and returns the same metrics as the token repeating method, but three tests (for the repeating method) are currently failing. Please let me know how to fix the tests or create new tests.
Indexing:
Retrieval: