Skip to content

Add reproduction log for MS MARCO data preparation (Windows)#3153

Open
mahdijaf wants to merge 2 commits intocastorini:masterfrom
mahdijaf:master
Open

Add reproduction log for MS MARCO data preparation (Windows)#3153
mahdijaf wants to merge 2 commits intocastorini:masterfrom
mahdijaf:master

Conversation

@mahdijaf
Copy link

@mahdijaf mahdijaf commented Mar 10, 2026

This PR adds my reproduction log entries for the Foundations of Retrieval onboarding steps in Anserini.

Environment:
OS: Windows 10
Python: 3.14.3
Java: OpenJDK 21
Maven: installed and used to build Anserini

Steps completed:

  • Downloaded MS MARCO collection
  • Converted collection.tsv to JSONL (9 files generated)
  • Filtered queries.dev.small.tsv (6980 queries)
  • Verified query → qrels → document mapping
  • Indexed the MS MARCO passage collection with Anserini
  • Reproduced the BM25 baseline on the dev set

Result:
MRR @10 = 0.18741227770955546
QueriesRanked: 6980

banana odyssey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant