Metadata-Version: 2.4
Name: labretriever
Version: 1.1.0
Summary: A collection of classes and functions to facilitate interaction with django.tfbindingandmodeling.com
License: GPL-3.0
License-File: LICENSE
Author: chasem
Author-email: chasem@wustl.edu
Requires-Python: >=3.11,<4.0
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: duckdb (>=1.3.2,<2.0.0)
Requires-Dist: huggingface-hub (>=0.34.4,<0.35.0)
Requires-Dist: mcp[cli] (>=1.27.1,<2.0.0)
Requires-Dist: pandas (>=2.3.1,<3.0.0)
Requires-Dist: pydantic (>=2.11.9,<3.0.0)
Requires-Dist: requests (>=2.32.3,<3.0.0)
Description-Content-Type: text/markdown

# labretriever

A Python package for querying and managing genomic and transcriptomic datasets
hosted on [HuggingFace Hub](https://huggingface.co). It provides a unified SQL
interface (via DuckDB) across heterogeneous datasets, with local caching and
structured metadata exploration.

See the [documentation](https://cmatKhan.github.io/labretriever) for full usage
guides and API reference. The [BrentLab yeast resources
collection](https://huggingface.co/collections/BrentLab/yeastresources) is an
example of datasets designed to work with this package.

## Installation

Install the latest release from PyPI:

```bash
pip install labretriever
```

To get the most recent changes ahead of a PyPI release, install directly from
the main branch on GitHub:

```bash
pip install git+https://github.com/cmatKhan/labretriever.git@main
```

Set your HuggingFace token if accessing private datasets:

```bash
export HF_TOKEN=your_token_here
```

## Usage

```python
from labretriever import VirtualDB

vdb = VirtualDB("config.yaml")

# Discover available views
vdb.tables()
vdb.describe("harbison")

# Query with SQL
df = vdb.query("SELECT * FROM harbison_meta WHERE carbon_source = $cs", cs="glucose")
```

`VirtualDB` loads datasets from HuggingFace (caching locally), constructs DuckDB
views over Parquet files, and exposes metadata and full-data views for SQL
querying. See the docs for how to write a `config.yaml` and structure your
HuggingFace dataset cards.

## Development

```bash
git clone https://github.com/cmatKhan/labretriever
cd labretriever
poetry install
poetry run pre-commit install
poetry run pytest
```

