Metadata-Version: 2.4
Name: epistabase
Version: 0.1.1
Summary: EpistaBase Python SDK and CLI for notebook and platform data access
Project-URL: Homepage, https://epistabase.com
Project-URL: Documentation, https://epistabase.com/docs/sdk
Project-URL: Changelog, https://epistabase.com/docs/sdk/changelog
Author: EpistaBase
License: Proprietary
Keywords: bioinformatics,biolake,epistabase,lakehouse,sdk
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: httpx>=0.26.0
Provides-Extra: cli
Requires-Dist: typer>=0.12.0; extra == 'cli'
Provides-Extra: dev
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: numpy>=1.26.0; extra == 'dev'
Requires-Dist: pandas>=2.0.0; extra == 'dev'
Requires-Dist: pillow>=10.0.0; extra == 'dev'
Requires-Dist: pyarrow>=15.0.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff==0.5.7; extra == 'dev'
Requires-Dist: typer>=0.12.0; extra == 'dev'
Provides-Extra: fs
Requires-Dist: fsspec>=2024.2.0; extra == 'fs'
Provides-Extra: image
Requires-Dist: numpy>=1.26.0; extra == 'image'
Requires-Dist: pillow>=10.0.0; extra == 'image'
Provides-Extra: table
Requires-Dist: pandas>=2.0.0; extra == 'table'
Requires-Dist: pyarrow>=15.0.0; extra == 'table'
Description-Content-Type: text/markdown

# EpistaBase Python SDK

`epistabase` is the governed Python client for the EpistaBase platform. It gives
notebooks, scripts, and the `epistabase` CLI typed, authenticated access to the
same projects, experiments, catalog, queries, sequences, and imaging you use in
the browser — without ever handling raw storage credentials.

It is a thin client: it wraps the EpistaBase API and nothing more. Data engines,
parsers, query planners, and statistics stay on the server or in your own
environment.

> **Naming:** you install the distribution `epistabase`, but the import package
> is currently `biolake` (`import biolake as bl`) and the environment variables
> are `BIOLAKE_*`. A full internal rename to `epistabase` is in progress.

## Install

```bash
pip install epistabase
```

Optional extras pull in heavier dependencies only when you need them:

```bash
pip install "epistabase[image]"   # numpy/pillow for tiled image reads
pip install "epistabase[cli]"     # the `epistabase` command-line interface
```

> The SDK requires Python 3.12.

## Authenticate

The SDK authenticates with a **scoped bearer token** and an **active workspace**.
There are two ways to get one.

### Sign in (CLI)

```bash
# Mint a Personal Access Token in the app (Settings → Developer), then:
epistabase auth login --token <PAT> --api-url https://api.example --workspace <id>
epistabase auth status             # shows the resolved API URL, workspace, and token source
```

Credentials are written to `~/.epistabase/credentials.json` and are picked up by
the SDK automatically. Access is governed entirely by your EpistaBase account —
installing the SDK grants nothing on its own. (Browser/device login is on the
roadmap; for now, sign in with a token.)

### Token / environment

For headless use (CI, servers, notebook kernels), provide a Personal Access Token
and workspace through the environment:

```bash
export BIOLAKE_API_URL="https://api.biolake.example"
export BIOLAKE_TOKEN="your-personal-access-token"   # or BIOLAKE_PAT
export BIOLAKE_WORKSPACE_ID="workspace-id"
export BIOLAKE_EXPERIMENT_ID="experiment-id"        # optional context
export BIOLAKE_NOTEBOOK_ID="notebook-id"            # optional context
```

The SDK never reads AWS, S3, or MinIO credentials. All data access flows through
governed BioLake API/data-plane services.

## Command-line data access

The same operations are available from the `epistabase` CLI, sharing the SDK's
credentials and governance. Once you have authenticated (above), the verbs run
against your active workspace:

```bash
epistabase ls --experiment EXP-12        # discover catalog assets   (--json to script)
epistabase get EXP-12/cells.fcs          # show an asset's metadata + lineage
epistabase pull EXP-12/cells.fcs         # download the data         (defaults to ./cells.fcs)
epistabase query "SELECT * FROM counts"  # run governed SQL          (--out result.parquet to save)
```

Every command takes an opaque asset id **or** a readable `[experiment/]name` path
(ambiguous names report the candidates). `get` shows metadata; `pull` downloads
the bytes, or materializes a table to `.csv` / `.parquet`.

## Quickstart

```python
import biolake as bl

# Query governed lakehouse tables
rows = bl.query("select * from current_table limit 10")

# Discover catalog assets
images = bl.assets(experiment="EXP-2026-0001", kind="IMAGE")
asset = bl.get(images[0].id)

# Promote a notebook output back into the experiment
result = bl.publish_figure(fig, name="Dose response", format="svg")
print(result.download_url)
```

The implicit session reads the environment above on first use. Pass an explicit
`Session` when you need more than one context in the same process.

## Publishing figures and tables

Notebook outputs can be promoted back into the current experiment. The SDK renders
the local object, sends bytes to the API, and the API stores the artifact in
governed storage:

```python
result = bl.publish_figure(fig, name="Dose response", format="svg")
result.catalog_asset_id
result.download_url

table = bl.publish_table(df, name="Summary table")  # CSV by default
```

`publish_figure()` accepts raw bytes, matplotlib figures with `savefig()`, and
plotly figures with `to_image()`. `publish_table()` accepts raw bytes, CSV text,
or dataframe-like objects with `to_csv()` / `to_parquet()`. Published artifacts use
`BIOLAKE_EXPERIMENT_ID` unless `experiment_id=` is given, and record the source
notebook id plus the query log from prior `bl.query()` calls.

## Catalog discovery

`bl.assets(...)` lists catalog assets visible to your token. Filters are
server-side and governed by the same read authorization as the web app:

```python
assets = bl.assets(experiment="EXP-2026-0001", kind="FLOW", tags=["sort-1"])
asset = bl.get(assets[0].id)

asset.kind              # "FLOW", "IMAGE", "TABLE", "VOLUME", ...
asset.format            # "fcs", "tiff", ...
asset.size_bytes
asset.experiment_number
asset.lineage
```

`AssetRef` and `Asset` are descriptors; they never expose standing storage
credentials. Type-specific accessors such as `biolake.image` layer on top.

## Image reads

```bash
pip install "epistabase[image]"
```

`biolake.image` resolves image assets through the catalog, mints short-lived WSI
tile sessions, and reads only bounded regions or thumbnails through the tile
service:

```python
img = bl.assets(experiment="EXP-2026-0001", kind="IMAGE")[0]

info = bl.image.info(img)
region = bl.image.read_region(img, 0, 0, 2048, 2048, level=2, channels=["DAPI"])
overview = bl.image.thumbnail(img, max_px=1024)
rois = bl.image.annotations(img)            # read-only GeoJSON ROIs
```

## Whole-blob reads

`bl.open(asset_id)` streams whole raw blobs (FCS, vendor exports, gel TIFFs,
CSV/XLSX) via a short-lived presigned GET URL:

```python
with bl.open("asset-id") as f:
    header = f.read(64)

with bl.open("asset-id", byte_range=(0, 1023)) as f:
    first_kb = f.read()

with bl.open("asset-id", download=True) as path:   # for libraries needing a path
    parse_vendor_file(path)
```

`open()` is whole-blob only; tiled/pyramidal images go through `biolake.image`.

## Notebook kernels

Inside an EpistaBase notebook kernel the environment is injected for you (a
short-lived scoped token plus a refresh URL), so `import biolake as bl` works with
no setup. The same code runs locally once you set the environment above or run
`epistabase auth login`.

## Development

This package is developed inside the EpistaBase monorepo under `sdk/` but is
released independently. From a checkout:

```bash
cd sdk
uv run --extra dev pytest -q
uv run --extra dev ruff check .
uv run --extra dev mypy src --strict
```

See [`AGENTS.md`](./AGENTS.md) for the thin-client scope rules and
[`docs/adr/ADR-012`](../docs/adr/ADR-012-sdk-independent-packaging-and-public-distribution.md)
for the packaging and distribution decision.

## License

Proprietary. © EpistaBase. Use of the SDK is governed by your EpistaBase agreement.
