Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions bin/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ java -cp `ls target/*-fatjar.jar` -Xms512M -Xmx192G -Dslf4j.internal.verbosity=W
# - "WARNING: Using incubator modules: jdk.incubator.vector" cannot be suppressed, so just grep -v it.
# - the issue with the grep solution is that it interferes with downloading progress bars
# - "SLF4J(I): Connected with provider of type [org.apache.logging.slf4j.SLF4JServiceProvider]": suppress using -Dslf4j
# - "WARNING: A restricted method in java.lang.foreign.Linker has been called" fixed by --enable-native-access=ALL-UNNAMED
105 changes: 105 additions & 0 deletions docs/automated-regressions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Automated Regressions from Prebuilt Indexes

This page describes the high-level pipeline implemented by `io.anserini.reproduce.RunRegressionFromPrebuiltIndexes`.

Unlike `run_regression.py` workflows that build indexes from raw corpora, this pipeline assumes indexes already exist as **prebuilt indexes** (or local index paths) and focuses on:

- Running retrieval commands for each configured condition/topic pair.
- Evaluating outputs with `trec_eval`.
- Comparing measured metrics against expected regression targets.

## Entry Point

Run via fatjar wrapper:

```bash
bin/run.sh io.anserini.reproduce.RunRegressionFromPrebuiltIndexes \
--regression msmarco-v1-passage.core
```

Key flags:

- `--regression [config]` (required): YAML name in `src/main/resources/reproduce/from-prebuilt-indexes/configs/`.
- `--print-commands`: print retrieval and eval commands.
- `--dry-run`: skip command execution.
- `--compute-index-size`: pre-scan unique `-index` references and print disk/download size summary.

## Config Model

The YAML config contains:

- `conditions[]`: each with a `name`, display metadata, and a retrieval `command` template.
- `topics[]` under each condition.
- `topic_key`: topic set identifier passed into search command.
- `eval_key`: qrels key for `trec_eval`.
- `expected_scores`: expected metric values.
- `metric_definitions`: `trec_eval` flag string for each metric.

Command templates use placeholders:

- `$fatjar`: resolved from runtime location of the current jar.
- `$threads`: currently fixed to `16`.
- `$topics`: topic key from config.
- `$output`: run file path under `runs/`.

## Pipeline

```mermaid
flowchart TD
A[Start: parse CLI args] --> B[Load YAML config from reproduce/from-prebuilt-indexes/configs]
B --> C[Ensure runs/ directory exists]
C --> D[Resolve fatjar path]
D --> E[Pre-scan all command templates and extract -index values]
E --> F{--compute-index-size?}
F -- Yes --> G[Resolve each index alias/path and print size summary table]
F -- No --> H[Skip size summary]
G --> I[For each condition]
H --> I
I --> J[For each topic in condition]
J --> K[Render retrieval command with placeholders]
K --> L{--print-commands?}
L -- Yes --> M[Print retrieval command]
L -- No --> N[Continue]
M --> O{--dry-run?}
N --> O
O -- No --> P[Execute retrieval command and write run file]
O -- Yes --> Q[Skip execution]
P --> R[Build eval commands per metric]
Q --> R
R --> S{--print-commands?}
S -- Yes --> T[Print eval commands]
S -- No --> U[Continue]
T --> V{--dry-run?}
U --> V
V -- No --> W[Execute trec_eval command for each metric]
V -- Yes --> X[Skip metric execution]
W --> Y[Compare observed vs expected score]
Y --> Z[Emit OK / OKISH / FAIL]
X --> AA[Next topic]
Z --> AA
AA --> AB{More topics/conditions?}
AB -- Yes --> J
AB -- No --> AC[Print total elapsed time]
AC --> AD[End]
```

## Evaluation Semantics

For each metric in `expected_scores`:

- Build eval command: `java -cp <fatjar> trec_eval <metric_definition> <eval_key> <output>`.
- Parse returned score.
- Compare against expected value.
- Higher than expected: `OKISH`.
- Exact (within `1e-5`): `OK`.
- Small deviation (within `2e-4`): `OKISH`.
- Otherwise: `FAIL`.

## Outputs

- Run files: `runs/run.<regression>.<condition>.<topic>.txt`
- Console logs with condition/topic progress.
- Optional index-size table.
- Optional command echo.
- Metric-by-metric regression checks.
- Final total runtime.
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ This page describes regression experiments, integrated into Anserini's regressio

In these experiments, we are using cached queries (i.e., cached results of query encoding).

The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-int8.cached.yaml).
Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-int8.cached.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-sqv.cached.yaml).
Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-sqv.cached.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-int8.cached
python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-sqv.cached
```

All the BEIR corpora, encoded by the BGE-base-en-v1.5 model and stored in Parquet format, are available for download:
Expand All @@ -33,12 +33,12 @@ Sample indexing command, building quantized flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-threads 4 \
-collection ParquetDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat-int8.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-quantize.int8 \
-index indexes/lucene-flat-sqv.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-quantize.sqv \
>& logs/log.beir-v1.0.0-arguana.bge-base-en-v1.5 &
```

Expand All @@ -52,19 +52,19 @@ After indexing has completed, you should be able to perform retrieval as follows

```
bin/run.sh io.anserini.search.SearchFlatDenseVectors \
-index indexes/lucene-flat-int8.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-index indexes/lucene-flat-sqv.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.gz \
-topicReader JsonStringVector \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-sqv-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt \
-hits 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:

```
bin/trec_eval -c -m ndcg_cut.10 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -c -m ndcg_cut.10 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-sqv-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-sqv-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-sqv-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt
```

## Effectiveness
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ This page describes regression experiments, integrated into Anserini's regressio

In these experiments, we are using ONNX to perform query encoding on the fly.

The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-int8.onnx.yaml).
Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-int8.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.
The exact configurations for these regressions are stored in [this YAML file](../../src/main/resources/regression/beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-sqv.onnx.yaml).
Note that this page is automatically generated from [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-sqv.onnx.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead and then run `bin/build.sh` to rebuild the documentation.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-int8.onnx
python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-arguana.bge-base-en-v1.5.parquet.flat-sqv.onnx
```

All the BEIR corpora, encoded by the BGE-base-en-v1.5 model and stored in Parquet format, are available for download:
Expand All @@ -33,12 +33,12 @@ Sample indexing command, building quantized flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-threads 4 \
-collection ParquetDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat-int8.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-quantize.int8 \
-index indexes/lucene-flat-sqv.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-quantize.sqv \
>& logs/log.beir-v1.0.0-arguana.bge-base-en-v1.5 &
```

Expand All @@ -52,19 +52,19 @@ After indexing has completed, you should be able to perform retrieval as follows

```
bin/run.sh io.anserini.search.SearchFlatDenseVectors \
-index indexes/lucene-flat-int8.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-index indexes/lucene-flat-sqv.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-arguana.test.tsv.gz \
-topicReader TsvString \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-onnx.topics.beir-v1.0.0-arguana.test.txt \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-sqv-onnx.topics.beir-v1.0.0-arguana.test.txt \
-encoder BgeBaseEn15 -hits 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:

```
bin/trec_eval -c -m ndcg_cut.10 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-onnx.topics.beir-v1.0.0-arguana.test.txt
bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-onnx.topics.beir-v1.0.0-arguana.test.txt
bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-onnx.topics.beir-v1.0.0-arguana.test.txt
bin/trec_eval -c -m ndcg_cut.10 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-sqv-onnx.topics.beir-v1.0.0-arguana.test.txt
bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-sqv-onnx.topics.beir-v1.0.0-arguana.test.txt
bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.beir-v1.0.0-arguana.test.txt runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-sqv-onnx.topics.beir-v1.0.0-arguana.test.txt
```

## Effectiveness
Expand All @@ -73,9 +73,9 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|---------------------|
| BEIR (v1.0.0): ArguAna | 0.6361 |
| BEIR (v1.0.0): ArguAna | 0.6375 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): ArguAna | 0.9915 |
| BEIR (v1.0.0): ArguAna | 0.9929 |
| **R@1000** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): ArguAna | 0.9964 |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Sample indexing command, building flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-threads 4 \
-collection ParquetDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Sample indexing command, building flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-threads 4 \
-collection ParquetDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
Expand Down Expand Up @@ -72,9 +72,9 @@ With the above commands, you should be able to reproduce the following results:

| **nDCG@10** | **BGE-base-en-v1.5**|
|:-------------------------------------------------------------------------------------------------------------|---------------------|
| BEIR (v1.0.0): ArguAna | 0.6361 |
| BEIR (v1.0.0): ArguAna | 0.6375 |
| **R@100** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): ArguAna | 0.9915 |
| BEIR (v1.0.0): ArguAna | 0.9929 |
| **R@1000** | **BGE-base-en-v1.5**|
| BEIR (v1.0.0): ArguAna | 0.9964 |

Expand Down
Loading
Loading