Metadata-Version: 2.4
Name: intermine314
Version: 0.1.7
Summary: Modernized InterMine WebService client (Python 3.14+)
Author: Kris Kari, Dr. Maria Ermakova, Plant Energy and Biotechnology Lab, Monash University
Maintainer-email: Kris Kari <toffe.kari@gmail.com>
License: MIT
Project-URL: Repository, https://github.com/karikris/intermine314
Project-URL: Issues, https://github.com/karikris/intermine314/issues
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.14
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.14
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.32.4
Requires-Dist: polars==1.6.0
Requires-Dist: duckdb==1.0.0
Provides-Extra: speed
Requires-Dist: orjson>=3.10.0; extra == "speed"
Provides-Extra: dev
Requires-Dist: ruff; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Provides-Extra: benchmark
Requires-Dist: intermine; extra == "benchmark"
Requires-Dist: pandas>=2.2.0; extra == "benchmark"
Provides-Extra: proxy
Requires-Dist: PySocks>=1.7.1; extra == "proxy"
Dynamic: license-file

# intermine314

[![CI](https://github.com/karikris/intermine314/actions/workflows/im-build.yml/badge.svg?branch=master)](https://github.com/karikris/intermine314/actions/workflows/im-build.yml)
[![PyPI version](https://img.shields.io/pypi/v/intermine314.svg)](https://pypi.org/project/intermine314/)
[![Python versions supported](https://img.shields.io/pypi/pyversions/intermine314.svg)](https://pypi.org/project/intermine314/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/karikris/intermine314/blob/master/LICENSE)

Modern InterMine client for Python 3.14+ with:

- query execution (`Service` + `Query`)
- parallel export with bounded memory (`ParallelOptions`)
- ELT workflows to Parquet, DuckDB, and Polars (`fetch_from_mine`)
- Tor-safe transport defaults (`socks5h://` policy in strict Tor mode)

Repository: https://github.com/karikris/intermine314

## Install

```bash
pip install intermine314
```

Optional extras:

```bash
pip install "intermine314[speed]"   # orjson
pip install "intermine314[proxy]"   # PySocks
```

## Quick Start

```python
from intermine314.service import Service

service = Service("https://maizemine.rnet.missouri.edu/maizemine/service")
query = service.select("Gene.primaryIdentifier", "Gene.symbol")

for row in query.rows(size=5):
    print(row)
```

Parallel export uses `ParallelOptions` only:

```python
from intermine314.query.builder import ParallelOptions

query.to_parquet(
    "/tmp/genes_parts",
    batch_size=5000,
    parallel_options=ParallelOptions(
        max_workers=8,
        profile="large_query",
        ordered="unordered",
        inflight_limit=8,
        max_inflight_bytes_estimate=64 * 1024 * 1024,
    ),
)
```

## API Migration Notes

Compatibility aliases were removed to keep the runtime API minimal and explicit:

- `service.query(...)` -> use `service.select(...)`
- `Service.get_mine_info(...)` -> use `Registry(...).info(...)`
- `Service.get_all_mines(...)` -> use `Registry(...).all_mines(...)`
- Query aliases removed: `filter`, `add_column*`, `add_views`, `order_by`, `all`, `size`, `summarize`, `c`
  Use canonical `Query` methods (`where`, `add_view`, `add_sort_order`, `count`, `column`).

Minimal high-level ELT workflow:

```python
from intermine314 import fetch_from_mine

result = fetch_from_mine(
    mine_url="https://maizemine.rnet.missouri.edu/maizemine/service",
    root_class="Gene",
    views=["Gene.primaryIdentifier", "Gene.symbol"],
    parquet_path="/tmp/genes.parquet",
    page_size=2_000,
    max_workers=8,
    inflight_limit=8,
    max_inflight_bytes_estimate=64 * 1024 * 1024,
)

managed_result = fetch_from_mine(
    mine_url="https://maizemine.rnet.missouri.edu/maizemine/service",
    root_class="Gene",
    views=["Gene.primaryIdentifier", "Gene.symbol"],
    parquet_path="/tmp/genes.parquet",
    max_workers=8,
    managed=True,
)
with managed_result["duckdb_connection"] as con:
    count = con.execute(
        f'SELECT COUNT(*) FROM "{managed_result["duckdb_table"]}"'
    ).fetchone()[0]
    print(count)
```

## Development

```bash
make lint
make test
make docs
```

Repository-only support directories:
- `docs/`, `samples/`, and `scripts/` are for development and examples.
- They are intentionally excluded from published package artifacts.

### Test Modes

Default `pytest` runs the offline invariant suite (fast and deterministic):
- Tor strict DNS-safe proxy enforcement (`socks5h://` requirement).
- Streaming response closure on early iterator termination.
- Session ownership lifecycle (`close()` closes only owned resources).
- Executor lifecycle closure under early parallel termination.
- Runtime defaults validation for parallel/query behavior.
- Storage policy single-source checks (Parquet compression + DuckDB identifier validation).
- DuckDB managed connection lifecycle closure.

Live network smoke tests are opt-in by filename (`live_*.py`):

```bash
INTERMINE314_RUN_LIVE_TESTS=1 pytest -q tests/live_*.py
```

Benchmark commands and benchmark-specific docs live in
[`benchmarks/README.md`](benchmarks/README.md).
Benchmarks are runner-script based (`python -m benchmarks...`); benchmark pytest globs are not part of CI or the default test workflow.

## License

MIT (see `LICENSE`).
