Metadata-Version: 2.4
Name: andlake
Version: 0.1.2
Summary: Andlake platform SDK — pre-configured connections to Trino, Nessie, MLflow, S3, and Iceberg.
License-Expression: Apache-2.0
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: boto3<2.0,>=1.35
Requires-Dist: mlflow<3.0,>=2.17
Requires-Dist: pyarrow>=17.0
Requires-Dist: pyiceberg<1.0,>=0.8
Requires-Dist: pynessie<1.0,>=0.60
Requires-Dist: trino[sqlalchemy]<1.0,>=0.329
Provides-Extra: dev
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Provides-Extra: lineage
Requires-Dist: openlineage-python<2.0,>=1.0; extra == 'lineage'
Description-Content-Type: text/markdown

# andlake-sdk

Pre-configured Python SDK for the Andlake data platform. Provides zero-config
access to Trino, Nessie, MLflow, S3, and Iceberg from JupyterHub notebooks.

## Quick Start

```python
from andlake import get_trino_connection, configure_mlflow
import pandas as pd

# Connect to Trino via the Andlake gateway
conn = get_trino_connection()
df = pd.read_sql("SELECT * FROM lake.silver.transactions LIMIT 1000", conn)

# Set up MLflow experiment tracking
configure_mlflow(experiment_name="fraud-detection")
```

## Available Functions

| Function | Description |
|---|---|
| `get_trino_connection()` | Trino DBAPI connection via the gateway |
| `get_trino_engine()` | SQLAlchemy engine for `pd.read_sql()` |
| `get_nessie_client()` | Nessie catalog client for branch management |
| `configure_mlflow()` | Set MLflow tracking URI and experiment |
| `get_mlflow_client()` | Pre-configured `MlflowClient` |
| `get_s3_client()` | boto3 S3 client (uses IRSA) |
| `get_s3_resource()` | boto3 S3 resource (uses IRSA) |
| `get_iceberg_catalog()` | PyIceberg REST catalog via Nessie |

## Environment Variables

Static service URLs are set by JupyterHub `extraEnv`. Per-user values are
injected by the `pre_spawn_hook` from Keycloak auth_state.

| Variable | Default | Source |
|---|---|---|
| `ANDLAKE_GATEWAY_URL` | `http://notebook-service:8082` | extraEnv |
| `TRINO_HOST` | `notebook-service` | extraEnv |
| `TRINO_PORT` | `8082` | extraEnv |
| `NESSIE_URI` | `http://nessie:19120/api/v2` | extraEnv |
| `MLFLOW_TRACKING_URI` | `http://mlflow:5000` | extraEnv |
| `ANDLAKE_DEFAULT_CATALOG` | `lake` | extraEnv |
| `ANDLAKE_S3_BUCKET` | `andlake-app` | extraEnv |
| `ANDLAKE_TENANT_ID` | *(required)* | pre_spawn_hook |
| `ANDLAKE_ACCESS_TOKEN` | *(required)* | pre_spawn_hook |

## Development

```bash
pip install -e ".[dev]"
pytest
```
