Metadata-Version: 2.4
Name: quackir
Version: 0.0.1
Summary: A Python toolkit for retrieval with relational database management systems.
Author-email: Lily Ge <l2ge@uwaterloo.ca>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/castorini/quackir
Project-URL: Issues, https://github.com/castorini/quackir/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: tqdm
Requires-Dist: pyserini>=1.0.0
Requires-Dist: duckdb>=1.1.1
Requires-Dist: psycopg2-binary>=2.9.10
Requires-Dist: pandas
Requires-Dist: python-dotenv
Dynamic: license-file

# QuackIR

[![LICENSE](https://img.shields.io/badge/license-Apache-blue.svg?style=flat)](https://www.apache.org/licenses/LICENSE-2.0)

QuackIR is a toolkit for reproducible information retrieval research with relational database management systems. 
Sparse retrieval is available with DuckDB, SQLite, and PostgreSQL. 
Dense and hybrid retrieval are available with DuckDB and PostgreSQL.
Analysis with the porter tokenizer is provided via wrapping Pyserini's Lucene analyzer.

## Installation

### Clone Repository

```bash
git clone https://github.com/castorini/quackir.git --recurse-submodules
```

### Install Dependencies

```bash
conda create -n quackir python=3.10
conda activate quackir
conda install -c conda-forge postgresql pgvector openjdk=21 maven -y
pip install -r requirements.txt
```

### Initialize PostgreSQL 

```bash
initdb -D mydb
pg_ctl -D mydb -l logfile start &
createdb quackir
psql quackir
create user postgres superuser;
create extension vector;
\q
```

## Quick Start

To create a sparse index with DuckDB:

```python
from quackir.index import DuckDBIndexer
from quackir import IndexType

table_name = "corpus"
index_type = IndexType.SPARSE

indexer = DuckDBIndexer()
indexer.init_table(table_name, index_type)
indexer.load_table(table_name, corpus_file)
indexer.fts_index(table_name)

indexer.close()
```

To perform sparse retrieval: 

```python
from quackir.search import DuckDBSearcher
from quackir import SearchType

table_name = "corpus"
query = "what is a lobster roll"
search_type = SearchType.SPARSE

searcher = DuckDBSearcher()
results = searcher.search(
    search_type, query_string=query, table_names=[table_name]
)
print(results)

searcher.close()
```

For using commands, see the [documentation](./docs/).

## Reproduce

For step-by-step reproduction of BEIR experiments, see these [docs](./docs/beir/).

To reproduce all BEIR experiments, run the following command and find the results in [logs](./logs/):

```bash
bash ./scripts/beir/run.sh
```
