Metadata-Version: 2.4
Name: bm25-benchmarks
Version: 0.0.3
Summary: CLI tools for running BM25 benchmark evals.
License-Expression: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: bm25s
Requires-Dist: beir; extra == "bm25s"
Requires-Dist: PyStemmer; extra == "bm25s"
Requires-Dist: ujson; extra == "bm25s"
Requires-Dist: bm25s[core]>=0.2.0rc8; extra == "bm25s"
Requires-Dist: numba; extra == "bm25s"
Provides-Extra: rank
Requires-Dist: beir; extra == "rank"
Requires-Dist: PyStemmer; extra == "rank"
Requires-Dist: ujson; extra == "rank"
Requires-Dist: numpy; extra == "rank"
Provides-Extra: rank-bm25
Requires-Dist: beir; extra == "rank-bm25"
Requires-Dist: PyStemmer; extra == "rank-bm25"
Requires-Dist: ujson; extra == "rank-bm25"
Requires-Dist: numpy; extra == "rank-bm25"
Provides-Extra: bm25-pt
Requires-Dist: beir; extra == "bm25-pt"
Requires-Dist: PyStemmer; extra == "bm25-pt"
Requires-Dist: ujson; extra == "bm25-pt"
Requires-Dist: bm25-pt; extra == "bm25-pt"
Requires-Dist: transformers; extra == "bm25-pt"
Provides-Extra: pyserini
Requires-Dist: beir; extra == "pyserini"
Requires-Dist: ujson; extra == "pyserini"
Requires-Dist: pyserini; extra == "pyserini"
Provides-Extra: elastic
Requires-Dist: beir; extra == "elastic"
Requires-Dist: ujson; extra == "elastic"
Requires-Dist: elasticsearch; extra == "elastic"
Provides-Extra: pisa
Requires-Dist: beir; extra == "pisa"
Requires-Dist: ujson; extra == "pisa"
Requires-Dist: pandas; extra == "pisa"
Requires-Dist: pyterrier_pisa>=0.3.0; extra == "pisa"
Provides-Extra: all
Requires-Dist: beir; extra == "all"
Requires-Dist: PyStemmer; extra == "all"
Requires-Dist: ujson; extra == "all"
Requires-Dist: bm25s[core]>=0.2.0rc8; extra == "all"
Requires-Dist: numba; extra == "all"
Requires-Dist: numpy; extra == "all"
Requires-Dist: bm25-pt; extra == "all"
Requires-Dist: transformers; extra == "all"
Requires-Dist: pyserini; extra == "all"
Requires-Dist: elasticsearch; extra == "all"
Requires-Dist: pandas; extra == "all"
Requires-Dist: pyterrier_pisa>=0.3.0; extra == "all"
Dynamic: license-file

# BM25 Benchmarks

## CLI

### Installation

From PyPI with `pip`:

```bash
pip install bm25-benchmarks
bm25-benchmark install bm25s
```

From GitHub with `pip`:

```bash
pip install "bm25-benchmarks @ git+https://github.com/xhluca/bm25-benchmarks.git"
bm25-benchmark install bm25s
```

With `uv` as a globally available tool:

```bash
uv tool install bm25-benchmarks
bm25-benchmark install bm25s --installer uv
```

With `uv` into the current virtual environment:

```bash
uv pip install "bm25-benchmarks @ git+https://github.com/xhluca/bm25-benchmarks.git"
bm25-benchmark install bm25s --installer uv
```

For local development:

```bash
pip install -e ".[bm25s]"
# or
uv pip install -e ".[bm25s]"
```

Install a different backend by replacing `bm25s` with `rank`, `bm25-pt`,
`pyserini`, `elastic`, `pisa`, or `all`.

For `rank-bm25`, use the CLI installer so the pinned Git dependency is installed
for you. PyPI rejects direct Git dependencies in package metadata, so the pin is
kept in `requirements-rank-bm25.txt` and the CLI installer:

```bash
bm25-benchmark install rank
bm25-benchmark install all
bm25-benchmark install rank --installer uv
```

### Usage

The benchmark package exposes one CLI for running evals:

```bash
bm25-benchmark --help
bm25-benchmark models
bm25-benchmark datasets
```

Run an eval by choosing a backend and dataset:

```bash
bm25-benchmark eval bm25s -d fiqa
bm25-benchmark eval rank-bm25 -d fiqa --samples 1000
bm25-benchmark eval pyserini -d fiqa --threads 4
bm25-benchmark eval elastic -d fiqa --hostname localhost
bm25-benchmark eval pisa -d fiqa
bm25-benchmark eval bm25-pt -d fiqa --batch-size 32
```

Common eval options include:

```bash
bm25-benchmark eval bm25s -d fiqa -d scifact --result-dir results --save-dir datasets
bm25-benchmark eval bm25s -d fiqa,scifact --num-runs 3
bm25-benchmark eval bm25s -d fiqa --dry-run
```

Use `bm25-benchmark eval <backend> --help` to see backend-specific options.

The module form is also available:

```bash
python -m benchmark eval bm25s -d fiqa
```

## Benchmarking

The legacy module entry points are still available:

```bash
# For bm25_pt
python -m benchmark.on_bm25_pt -d "<dataset>"

# For rank-bm25
python -m benchmark.on_rank_bm25 -d "<dataset>"

# for Pyserini
python -m benchmark.on_pyserini -d "<dataset>"

# For elastic, After starting the server, run:
python -m benchmark.on_elastic -d "<dataset>"

# for PISA
python -m benchmark.on_pisa -d "<dataset>"

# For bm25s
python -m benchmark.on_bm25s -d "<dataset>"
```

where `<dataset>` is the name of the dataset to be used.

### BM25S options

For `bm25s`, you can specify which scoring methods and retrieval backends to benchmark:

```bash
# Default: runs the jit scorer and numba backend
bm25-benchmark eval bm25s -d fiqa

# Specify scorers (uncompiled, legacy, jit)
bm25-benchmark eval bm25s -d fiqa --scorers legacy jit
bm25-benchmark eval bm25s -d fiqa --scorers jit
bm25-benchmark eval bm25s -d fiqa --scorers uncompiled legacy jit

# Specify backends (jax, numba, numpy)
bm25-benchmark eval bm25s -d fiqa --backends numba
bm25-benchmark eval bm25s -d fiqa --backends jax numba numpy

# Combine both
bm25-benchmark eval bm25s -d fiqa --scorers jit --backends numba
```

Scorer options:
- `uncompiled`: Default NumPy implementation (optimized with `np.add.at`)
- `legacy`: Legacy implementation (similar to uncompiled, kept for comparison)
- `jit`: Numba JIT-compiled version (fastest after warmup)

Backend options:
- `jax`: JAX-based retrieval
- `numba`: Numba JIT-compiled retrieval (default)
- `numpy`: Pure NumPy retrieval 


### Available datasets

The available datasets are public BEIR datasets: `trec-covid`, `nfcorpus`, `fiqa`, `arguana`, `webis-touche2020`, `quora`, `scidocs`, `scifact`, `cqadupstack`, `nq`, `msmarco`, `hotpotqa`, `dbpedia-entity`, `fever`, `climate-fever`, 
### Sampling during benchmarking

For `rank-bm25`, due to the long runtime, we can sample queries
```bash
bm25-benchmark eval rank-bm25 -d "<dataset>" --samples <num_samples>
```

### Rank-bm25 variants

For `rank-bm25`, we can also specify the method with `--method` to be used:
- `rank` (default)
- `bm25l`
- `bm25+`

Results will be saved in `results/` directory.

### Elasticsearch server

If you want to use elastic search, you need to start the server first. 

First, download the elastic search from [here](https://www.elastic.co/downloads/past-releases/elasticsearch-8-14-0). You will get a file, e.g. `elasticsearch-8.14.0-linux-x86_64.tar.gz`. Extract the file and ensure it is in the same directory as the `bm25-benchmarks` directory.

```bash
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.14.0-linux-x86_64.tar.gz
tar -xzf elasticsearch-8.14.0-linux-x86_64.tar.gz
# remove the tar file
rm elasticsearch-8.14.0-linux-x86_64.tar.gz
```

Then, start the server with the following command:

```bash
./elasticsearch-8.14.0/bin/elasticsearch -E xpack.security.enabled=false -E thread_pool.search.size=1 -E thread_pool.write.size=1
```

## Results

The results are benchmarked using Kaggle notebooks to ensure reproducibility. Each one is run on single-core, Intel Xeon CPU @ 2.20GHz, using 30GB RAM.

The shorthands used are:
- `BM25PT` for `bm25_pt`
- `PSRN` for `pyserini`
- `R-BM25` for `rank-bm25`
- `BM25S` for `bm25`, and `BM25S+J` for Numba JIT version of `bm25s` (v0.2.0+)
- `ES` for `elasticsearch`
- `PISA` for the [Pisa Engine](https://github.com/pisa-engine/pisa) (via the [`pyterrier_pisa`](https://github.com/terrierteam/pyterrier_pisa) Python bindings)
- `OOM` for out-of-memory error
- `DNT` for did not terminate (i.e. went over 12 hours)

### Queries per second

| dataset          |   PISA  | BM25S+J |   BM25S |    ES |   PSRN |     PT | R-BM25 |
|:-----------------|--------:|--------:|--------:|------:|-------:|-------:|-------:|
| arguana          |  270.53 |  869.95 |  573.91 | 13.67 |  11.95 | 110.51 |   2    |
| climate-fever    |   35.95 |   38.49 |   13.09 |  4.02 |   8.06 | OOM    |   0.03 |
| cqadupstack      |  362.39 |  396.5  |  170.91 | 13.38 | DNT    | OOM    |   0.77 |
| dbpedia-entity   |  197.45 |   71.8  |   13.44 | 10.68 |  12.69 | OOM    |   0.11 |
| fever            |   81.42 |   53.84 |   20.19 |  7.45 |  10.52 | OOM    |   0.06 |
| fiqa             |  714.35 | 1237.39 |  717.78 | 16.96 |  12.51 |  20.52 |   4.46 |
| hotpotqa         |   54.98 |   47.16 |   20.88 |  7.11 |  10.41 | OOM    |   0.04 |
| msmarco          |  178.65 |   39.18 |   12.2  | 11.88 |  11.01 | OOM    |   0.07 |
| nfcorpus         | 5111.72 | 5696.21 | 1196.16 | 45.84 |  32.94 | 256.67 | 224.66 |
| nq               |  168.12 |  109.47 |   41.85 | 12.16 |  11.04 | OOM    |   0.1  |
| quora            |  735.20 |  479.71 |  272.04 | 21.8  |  15.58 |   6.49 |   1.18 |
| scidocs          |  818.97 | 1448.32 |  767.05 | 17.93 |  14.1  |  41.34 |   9.01 |
| scifact          | 1463.73 | 2787.84 | 1317.12 | 20.81 |  15.02 | 184.3  |  47.6  |
| trec-covid       |  282.94 |  483.84 |   85.64 |  7.34 |   8.53 |   3.73 |   1.48 |
| webis-touche2020 |  431.12 |  390.03 |   60.59 | 13.53 |  12.36 | OOM    |   1.1  |


Notes:
* For Rank-BM25, larger datasets are ran with 1000 samples rather than the full dataset to ensure it finishes within 12h (limit for Kaggle notebooks).
* For ES and BM25S, we can set a number of threads to use. However, you might not see an improvement, in fact you might even see a decrease in throughput in the case of BM25S due to how multi-threading is implemented. Click below to see the results.

<details>
<summary>Show BM25S & ES multi-threaded (4T) performance (Q/s)</summary>

| dataset          |    PISA |   BM25S |    ES |
|:-----------------|--------:|--------:|------:|
| arguana          |  590.93 |  211    | 33.37 |
| climate-fever    |   91.68 |   22.06 |  8.13 |
| cqadupstack      |  945.66 |  248.87 | 27.76 |
| dbpedia-entity   |  478.26 |   26.18 | 15.49 |
| fever            |  222.08 |   47.03 | 14.07 |
| fiqa             | 1382.32 |  449.82 | 36.33 |
| hotpotqa         |  134.60 |   45.02 | 10.35 |
| msmarco          |  393.16 |   21.64 | 18.19 |
| nfcorpus         | 6706.53 |  784.24 | 81.07 |
| nq               |  423.54 |   77.49 | 19.18 |
| quora            | 1892.98 |  308.58 | 43.02 |
| scidocs          | 1757.44 |  614.23 | 46.36 |
| scifact          | 2480.86 |  645.88 | 50.93 |
| trec-covid       |  676.40 |  100.88 | 13.5  |
| webis-touche2020 |  938.57 |  202.39 | 26.55 |


</details>

<details>
<summary>Show normalized table wrt Rank-BM25</summary>

| dataset          |    PISA |   BM25S |     ES |   PSRN |     PT |   Rank |
|:-----------------|--------:|--------:|-------:|-------:|-------:|-------:|
| arguana          |  135.27 |  286.96 |   6.84 |   5.98 |  55.26 |      1 |
| climate-fever    | 1198.33 |  436.33 | 134    | 268.67 | nan    |      1 |
| cqadupstack      |  470.64 |  221.96 |  17.38 | nan    | nan    |      1 |
| dbpedia-entity   | 1795.00 |  122.18 |  97.09 | 115.36 | nan    |      1 |
| fever            | 1357.00 |  336.5  | 124.17 | 175.33 | nan    |      1 |
| fiqa             |  160.17 |  160.94 |   3.8  |   2.8  |   4.6  |      1 |
| hotpotqa         | 1374.50 |  522    | 177.75 | 260.25 | nan    |      1 |
| msmarco          | 2552.14 |  174.29 | 169.71 | 157.29 | nan    |      1 |
| nfcorpus         |   22.75 |    5.32 |   0.2  |   0.15 |   1.14 |      1 |
| nq               | 1681.20 |  418.5  | 121.6  | 110.4  | nan    |      1 |
| quora            |  623.05 |  230.54 |  18.47 |  13.2  |   5.5  |      1 |
| scidocs          |   90.90 |   85.13 |   1.99 |   1.56 |   4.59 |      1 |
| scifact          |   30.75 |   27.67 |   0.44 |   0.32 |   3.87 |      1 |
| trec-covid       |  191.18 |   57.86 |   4.96 |   5.76 |   2.52 |      1 |
| webis-touche2020 |  391.93 |   55.08 |  12.3  |  11.24 | nan    |      1 |

</details>


#### Stats

|                  | # Docs   | # Queries   | # Tokens   |
|:-----------------|:-----------|:--------------|:-------------|
| msmarco          | 8,841,823  | 6,980         | 340,859,891  |
| hotpotqa         | 5,233,329  | 7,405         | 169,530,287  |
| trec-covid       | 171,332    | 50            | 20,231,412   |
| webis-touche2020 | 382,545    | 49            | 74,180,340   |
| arguana          | 8,674      | 1,406         | 947,470      |
| fiqa             | 57,638     | 648           | 5,189,035    |
| nfcorpus         | 3,633      | 323           | 614,081      |
| climate-fever    | 5,416,593  | 1,535         | 318,190,120  |
| nq               | 2,681,468  | 3,452         | 148,249,808  |
| scidocs          | 25,657     | 1,000         | 3,211,248    |
| quora            | 522,931    | 10,000        | 4,202,123    |
| dbpedia-entity   | 4,635,922  | 400           | 162,336,256  |
| cqadupstack      | 457,199    | 13,145        | 44,857,487   |
| fever            | 5,416,568  | 6,666         | 318,184,321  |
| scifact          | 5,183      | 300           | 812,074      |

#### Indexing time (docs/s)

The following results follow the same setup as the queries/s benchmarks described above (single-core).

| dataset          |     PISA |    BM25S |      ES |     PSRN |      PT |     Rank |
|:-----------------|---------:|---------:|--------:|---------:|--------:|---------:|
| arguana          |   3432.50|  4314.79 | 3591.63 |  1225.18 |  638.1  |  5021.3  |
| climate-fever    |   5462.73|  4364.43 | 3825.89 |  6880.42 |  nan    |  7085.51 |
| cqadupstack      |   3963.76|  4800.89 | 3725.43 |   nan    |  nan    |  5370.32 |
| dbpedia-entity   |   9019.62|  7576.28 | 6333.82 |  8501.7  |  nan    |  9110.36 |
| fever            |   4903.06|  4921.88 | 3879.63 |  7007.5  |  nan    |  5482.64 |
| fiqa             |   4426.92|  5959.25 | 4035.11 |  3735.38 |  421.51 |  6455.53 |
| hotpotqa         |   9883.85|  7420.39 | 5455.6  | 10342.5  |  nan    |  9407.9  |
| msmarco          |  10205.53|  7480.71 | 5391.29 |  9686.07 |  nan    | 12455.9  |
| nfcorpus         |   2381.11|  3169.4  | 1688.15 |   692.05 |  442.2  |  3579.47 |
| nq               |   7122.05|  6083.86 | 5742.13 |  6652.33 |  nan    |  6048.85 |
| quora            |  38512.02| 28002.4  | 8189.75 | 22818.5  | 6251.26 | 47609.2  |
| scidocs          |   3085.13|  4107.46 | 3008.45 |  2137.64 |  312.72 |  4232.15 |
| scifact          |   2449.91|  3253.63 | 2649.57 |   880.53 |  442.61 |  3792.84 |
| trec-covid       |   4642.59|  4600.14 | 2966.98 |  3768.1  |  406.37 |  4672.62 |
| webis-touche2020 |   2228.10|  2971.96 | 2484.87 |  2718.41 |  nan    |  3115.96 |

#### NDCG@10

We use abbreviations for datasets of BEIR benchmarks.

<details>
<summary>Click to show dataset abbreviations</summary>

- `AG` for arguana
- `CD` for cqadupstack
- `CF` for climate-fever
- `DB` for dbpedia-entity
- `FQ` for fiqa
- `FV` for fever
- `HP` for hotpotqa
- `MS` for msmarco
- `NF` for nfcorpus
- `NQ` for nq
- `QR` for quora
- `SD` for scidocs
- `SF` for scifact
- `TC` for trec-covid
- `WT` for webis-touche2020

</details>


|   k1 |   b | method    |   Avg. |   AG | CD   | CF   | DB   |   FQ | FV   | HP   | MS   |   NF | NQ   |   QR |   SD |   SF |   TC | WT   |
|-----:|----:|:----------|-------:|-----:|:-----|:-----|:-----|-----:|:-----|:-----|:-----|-----:|:-----|-----:|-----:|-----:|-----:|:-----|
|  0.9 | 0.4 | Lucene    |   41.1 | 40.8 | 28.2 | 16.2 | 31.9 | 23.8 | 63.8 | 62.9 | 22.8 | 31.8 | 30.5 | 78.7 | 15.0 | 67.6 | 58.9 | 44.2 |
|  1.2 | 0.75 | ATIRE     |   39.9 | 48.7 | 30.1 | 13.7 | 30.3 | 25.3 | 50.3 | 58.5 | 22.6 | 31.8 | 29.1 | 80.5 | 15.6 | 68.1 | 61.0 | 33.2 |
|  1.2 | 0.75 | BM25+     |   39.9 | 48.7 | 30.1 | 13.7 | 30.3 | 25.3 | 50.3 | 58.5 | 22.6 | 31.8 | 29.1 | 80.5 | 15.6 | 68.1 | 61.0 | 33.2 |
|  1.2 | 0.75 | BM25L     |   39.5 | 49.6 | 29.8 | 13.5 | 29.4 | 25.0 | 46.6 | 55.9 | 21.4 | 32.2 | 28.1 | 80.3 | 15.8 | 68.7 | 62.9 | 33.0 |
|  1.2 | 0.75 | Lucene    |   39.9 | 48.7 | 30.1 | 13.7 | 30.3 | 25.3 | 50.3 | 58.5 | 22.6 | 31.8 | 29.1 | 80.5 | 15.6 | 68.0 | 61.0 | 33.2 |
|  1.2 | 0.75 | Robertson |   39.9 | 49.2 | 29.9 | 13.7 | 30.3 | 25.4 | 50.3 | 58.5 | 22.6 | 31.9 | 29.2 | 80.4 | 15.5 | 68.3 | 59.0 | 33.8 |
|  1.5 | 0.75 | ES        |   42.0 | 47.7 | 29.8 | 17.8 | 31.1 | 25.3 | 62.0 | 58.6 | 22.1 | 34.4 | 31.6 | 80.6 | 16.3 | 69.0 | 68.0 | 35.4 |
|  1.5 | 0.75 | Lucene    |   39.7 | 49.3 | 29.9 | 13.6 | 29.9 | 25.1 | 48.1 | 56.9 | 21.9 | 32.1 | 28.5 | 80.4 | 15.8 | 68.7 | 62.3 | 33.1 |
|  1.5 | 0.75 | PSRN      |   40.0 | 48.4 | 29.8 | 14.2 | 30.0 | 25.3 | 50.0 | 57.6 | 22.1 | 32.6 | 28.6 | 80.6 | 15.6 | 68.8 | 63.4 | 33.5 |
|  1.5 | 0.75 | PT        |   45.0 | 44.9 | --   | --   | --   | 22.5 | --   | --   | --   | 31.9 | --   | 75.1 | 14.7 | 67.8 | 58.0 | --   |
|  1.5 | 0.75 | Rank      |   39.6 | 49.5 | 29.6 | 13.6 | 29.9 | 25.3 | 49.3 | 58.1 | 21.1 | 32.1 | 28.5 | 80.3 | 15.8 | 68.5 | 60.1 | 32.9 |
|  1.2 | 0.75 | PISA      |   38.8 | 41.1 | 27.8 | 13.9 | 30.5 | 24.5 | 49.2  | 58.2 | 22.8 | 34.3 | 28.2 | 72.0 | 15.7 | 68.9 | 64.2 | 30.9  |



#### Recall@1000

|   k1 |   b | method    |   Avg. |   AG | CD   | CF   | DB   |   FQ | FV   | HP   | MS   |   NF | NQ   |   QR |   SD |   SF |   TC | WT   |
|-----:|----:|:----------|-------:|-----:|:-----|:-----|:-----|-----:|:-----|:-----|:-----|-----:|:-----|-----:|-----:|-----:|-----:|:-----|
|  0.9 | 0.4 | Lucene    |   77.3 | 98.8 | 71.1 | 63.3 | 67.5 | 74.3 | 95.7 | 88.0 | 85.3 | 47.7 | 89.6 | 99.5 | 56.5 | 97.0 | 39.2 | 86.0 |
|  1.2 | 0.75 | ATIRE     |   77.4 | 99.3 | 73.0 | 59.0 | 67.0 | 76.5 | 94.2 | 86.8 | 85.7 | 47.8 | 89.8 | 99.5 | 57.3 | 97.0 | 40.3 | 87.2 |
|  1.2 | 0.75 | BM25+     |   77.4 | 99.3 | 73.0 | 59.0 | 67.0 | 76.5 | 94.2 | 86.8 | 85.7 | 47.8 | 89.8 | 99.5 | 57.3 | 97.0 | 40.3 | 87.2 |
|  1.2 | 0.75 | BM25L     |   77.2 | 99.4 | 73.4 | 57.3 | 66.1 | 77.3 | 93.7 | 85.7 | 85.0 | 47.7 | 89.3 | 99.5 | 57.7 | 97.0 | 40.8 | 87.5 |
|  1.2 | 0.75 | Lucene    |   77.4 | 99.3 | 73.0 | 59.0 | 67.0 | 76.5 | 94.2 | 86.8 | 85.6 | 47.8 | 89.8 | 99.5 | 57.3 | 97.0 | 40.3 | 87.2 |
|  1.2 | 0.75 | Robertson |   77.4 | 99.3 | 73.2 | 59.1 | 66.7 | 76.8 | 94.2 | 86.8 | 85.9 | 47.5 | 89.8 | 99.5 | 57.3 | 96.7 | 40.2 | 87.4 |
|  1.5 | 0.75 | ES        |   76.9 | 99.2 | 74.2 | 58.8 | 63.6 | 76.7 | 95.9 | 85.2 | 85.1 | 39.0 | 90.8 | 99.6 | 57.9 | 98.0 | 41.3 | 88.0 |
|  1.5 | 0.75 | Lucene    |   77.2 | 99.3 | 73.3 | 57.8 | 66.3 | 77.2 | 93.8 | 86.1 | 85.2 | 47.7 | 89.5 | 99.6 | 57.5 | 97.0 | 40.6 | 87.4 |
|  1.5 | 0.75 | PSRN      |   76.7 | 99.2 | 74.2 | 58.7 | 66.2 | 76.7 | 94.2 | 86.4 | 85.1 | 37.1 | 89.4 | 99.6 | 57.4 | 97.7 | 41.1 | 87.2 |
|  1.5 | 0.75 | PT        |   73.0 | 98.3 | --   | --   | --   | 72.5 | --   | --   | --   | 51.0 | --   | 98.9 | 56.0 | 97.8 | 36.3 | --   |
|  1.5 | 0.75 | Rank      |   77.1 | 99.4 | 73.4 | 57.5 | 66.4 | 77.4 | 93.6 | 87.7 | 82.6 | 47.6 | 89.5 | 99.5 | 57.4 | 96.7 | 40.5 | 87.5 |
|  1.2 | 0.75 | PISA      |   77.1 | 98.7 | 72.2 | 60.2 | 67.7 | 76.5 | 93.7 | 86.8 | 86.9 | 38.4 | 89.1 | 98.9 | 56.9 | 97.0 | 45.9 | 87.4 |


#### Links

* [BM25+](https://www.kaggle.com/code/xhlulu/comparing-bm25s-bm25plus)
* [BM25L](https://www.kaggle.com/code/xhlulu/comparing-bm25s-bm25l)
* [ATIRE](https://www.kaggle.com/code/xhlulu/comparing-bm25s-atire)
* [Robertson](https://www.kaggle.com/code/xhlulu/comparing-bm25s-robertson)
* [Lucene (k1=1.2, b=0.75)](https://www.kaggle.com/code/xhlulu/comparing-bm25s-lucene)
* [Lucene (k1=0.9, b=0.4)](https://www.kaggle.com/code/xhlulu/comparing-bm25s-lucene-k1-0-9-b-0-4)
* Lucene (k1=1.5, b=0.75): [DB](https://www.kaggle.com/code/xhlulu/benchmark-bm25s-dbpedia-entity), [MS](https://www.kaggle.com/code/xhlulu/benchmark-bm25s-msmarco), [FV](https://www.kaggle.com/code/xhlulu/benchmark-bm25s-fever), [CF](https://www.kaggle.com/code/xhlulu/benchmark-bm25s-climate-fever), [NQ](https://www.kaggle.com/code/xhlulu/benchmark-bm25s-nq), [HP](https://www.kaggle.com/code/xhlulu/benchmark-bm25s-hotpotqa), [Remaining](https://www.kaggle.com/code/xhlulu/benchmark-bm25s-sub-1m)
* [ES](https://www.kaggle.com/code/xhlulu/run-elasticsearch-k1-1-5-b-0-75): [FV, CF](https://www.kaggle.com/code/xhlulu/benchmark-elasticsearch-fever), [NQ](https://www.kaggle.com/code/xhlulu/benchmark-elasticsearch-nq), [MS](https://www.kaggle.com/code/xhlulu/benchmark-elasticsearch-msmarco), [HP](https://www.kaggle.com/code/xhlulu/benchmark-elasticsearch-hotpotqa), [DB](https://www.kaggle.com/code/xhlulu/benchmark-elasticsearch-dbpedia-entity), [Remaining](https://www.kaggle.com/code/xhlulu/benchmark-elasticsearch-sub-1m)
* PT: [MS](https://www.kaggle.com/code/xhlulu/benchmark-bm25-pt-msmarco), [CF](https://www.kaggle.com/code/xhlulu/benchmark-bm25-pt-climate-fever), [FV](https://www.kaggle.com/code/xhlulu/benchmark-bm25-pt-fever), [DB](https://www.kaggle.com/code/xhlulu/benchmark-bm25-pt-dbpedia-entity), [HP](https://www.kaggle.com/code/xhlulu/benchmark-bm25-pt-hotpotqa), [NQ](https://www.kaggle.com/code/xhlulu/benchmark-bm25-pt-nq), [WT](https://www.kaggle.com/code/xhlulu/benchmark-bm25-pt-webis-touche2020), [CD](https://www.kaggle.com/code/xhlulu/benchmark-bm25-pt-cqadupstack), [Remaining](https://www.kaggle.com/code/xhlulu/benchmark-bm25-pt-sub-1m)
* Rank: [DB](https://www.kaggle.com/code/xhlulu/benchmark-rank-bm25-dbpedia-entity), [HP](https://www.kaggle.com/code/xhlulu/benchmark-rank-bm25-hotpotqa), [CF](https://www.kaggle.com/code/xhlulu/benchmark-rank-bm25-climate-fever), [FV](https://www.kaggle.com/code/xhlulu/benchmark-rank-bm25-fever), [MS](https://www.kaggle.com/code/xhlulu/benchmark-rank-bm25-msmarco), [NQ](https://www.kaggle.com/code/xhlulu/benchmark-rank-bm25-nq), [CD](https://www.kaggle.com/code/xhlulu/benchmark-rank-bm25-cqadupstack), [Remaining](https://www.kaggle.com/code/xhlulu/benchmark-rank-bm25-sub-1m)
* PSRN: [CD](https://www.kaggle.com/code/xhlulu/benchmark-pyserini-cqadupstack), [FV](https://www.kaggle.com/code/xhlulu/benchmark-pyserini-fever), [HP](https://www.kaggle.com/code/xhlulu/benchmark-pyserini-hotpotqa), [MS](https://www.kaggle.com/code/xhlulu/benchmark-pyserini-msmarco), [DB](https://www.kaggle.com/code/xhlulu/benchmark-pyserini-dbpedia-entity), [NQ](https://www.kaggle.com/code/xhlulu/benchmark-pyserini-nq), [Remaining](https://www.kaggle.com/code/xhlulu/benchmark-pyserini-sub-1m)
* PISA: [NQ](https://www.kaggle.com/smac2048/pisa-nq), [DB](https://www.kaggle.com/code/smac2048/pisa-dbpedia-entity), [CF](https://www.kaggle.com/code/smac2048/pisa-climate-fever), [HP](https://www.kaggle.com/code/smac2048/pisa-hotpotqa), [FV](https://www.kaggle.com/code/smac2048/pisa-fever), [MS](https://www.kaggle.com/code/smac2048/pisa-msmarco), [CD](https://www.kaggle.com/code/smac2048/pisa-cqadupstack), [Remaining](https://www.kaggle.com/code/smac2048/pisa-rest)
* BM25+J: [Sub-1m](https://www.kaggle.com/code/xhlulu/benchmark-bm25s-numba-sub-1m), [remaining](https://www.kaggle.com/code/xhlulu/benchmark-bm25s-numba-rest)
