Metadata-Version: 2.4
Name: onehouse-python-sdk
Version: 0.2.0
Summary: Onehouse Python SDK — modular data-plane clients for LakeBase and beyond
Project-URL: Homepage, https://github.com/onehouseinc/onehouse-python-sdk
Project-URL: Repository, https://github.com/onehouseinc/onehouse-python-sdk
Project-URL: Issues, https://github.com/onehouseinc/onehouse-python-sdk/issues
Author-email: Onehouse Team <eng@onehouse.ai>
License: Proprietary
Keywords: auth,azure,lakebase,oauth2,oidc,onehouse,postgresql,saml
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: lakebase
Requires-Dist: certifi>=2024.1; extra == 'lakebase'
Requires-Dist: psycopg2-binary>=2.9; extra == 'lakebase'
Provides-Extra: resources
Requires-Dist: requests>=2.28; extra == 'resources'
Description-Content-Type: text/markdown

# onehouse-python-sdk

Python SDK for connecting to [Onehouse](https://www.onehouse.ai) data-plane and control-plane services.

The base package has **zero required dependencies**. Drivers ship as optional extras so you only install what you need.

```bash
pip install onehouse-python-sdk[lakebase]    # PostgreSQL/LakeBase SQL client
pip install onehouse-python-sdk[resources]   # Control-plane REST client
```

## LakeBase

LakeBase is Onehouse's PostgreSQL-compatible managed lakehouse engine. This SDK provides a `psycopg2`-backed client that handles the browser-based and federated authentication flows LakeBase clusters require, on top of standard username/password auth.

### Installation

```bash
pip install onehouse-python-sdk[lakebase]
```

Requires Python 3.9+.

### Quickstart

```python
from onehouse_python_sdk import LakebaseClient

# Username / password
with LakebaseClient().connect(
    host="<cluster-host>",
    port=5432,
    dbname="mydb",
    user="admin",
    password="secret",
) as client:
    rows = client.fetchall("SELECT * FROM mytable WHERE id = %s", (42,))
```

### Authentication flows

| Flow | Parameters |
|------|------------|
| Username / password | `user`, `password` |
| OIDC Device Flow (Okta / Auth0) | `browser_auth=true`, `oidc_client_id`, `oidc_issuer_url`, `oidc_iam_role` |
| Azure AD OAuth2 | `browser_auth=true`, `azure_oauth_tenant_id`, `azure_oauth_client_id`, `azure_oauth_client_secret` |
| Azure Entra ID SAML | `browser_auth=true`, `azure_tenant_id`, `azure_entity_id` |
| Built-in login form | `browser_auth=true` _(default when no IdP params are set)_ |
| External redirect | `browser_auth=true`, `auth_redirect_url` |

```python
# OIDC Device Flow
client = LakebaseClient().connect(
    host="<cluster-host>",
    port=5432,
    dbname="mydb",
    browser_auth="true",
    oidc_client_id="0oaXXXXX",
    oidc_issuer_url="https://myorg.okta.com",
    oidc_iam_role="arn:aws:iam::123456789012:role/LakeBaseRole",
)

# Azure AD OAuth2
client = LakebaseClient().connect(
    host="<cluster-host>",
    port=5432,
    dbname="mydb",
    browser_auth="true",
    azure_oauth_tenant_id="your-tenant-id",
    azure_oauth_client_id="your-client-id",
    azure_oauth_client_secret="your-client-secret",
)
```

DSN string form is also supported:

```python
client = LakebaseClient().connect(
    "postgresql://<cluster-host>:5432/mydb"
    "?browser_auth=true"
    "&oidc_client_id=0oaXXXX"
    "&oidc_issuer_url=https://myorg.okta.com"
    "&oidc_iam_role=arn:aws:iam::123456789012:role/LakeBaseRole"
)
```

### Client API

All clients implement the `SqlClient` interface:

| Method | Description |
|--------|-------------|
| `connect(dsn=None, **kwargs) → self` | Establish connection, returns self for chaining |
| `execute(sql, params)` | Run a statement, return rowcount |
| `fetchall(sql, params)` | Run a query, return `list[tuple]` |
| `fetchone(sql, params)` | Run a query, return first row or `None` |
| `cursor()` | Raw cursor for advanced use |
| `raw_connection` | Underlying `psycopg2` connection |
| `close()` | Close the connection |
| `__enter__` / `__exit__` | Context manager — closes on exit |

### Notes

- **Credential caching** — auth tokens are cached for 4 minutes per connection parameters to avoid repeated browser prompts within the same process.
- **Callback port** — the local auth callback server defaults to port `8888` at path `/lakebase`. Override with `auth_callback_port` and `auth_callback_path`.

## Onehouse Resources (Control-Plane API)

`OnehouseResources` wraps the Onehouse REST API for managing platform resources — clusters, lakes, flows, jobs, table services, and more. It posts SQL statements to `https://api.onehouse.ai/v1/resource/` and polls `/v1/status/{requestId}` for the result.

### Installation

```bash
pip install onehouse-python-sdk[resources]
```

### Quickstart

```python
from onehouse_python_sdk import OnehouseResources

client = OnehouseResources(
    account_uid="...", project_uid="...",
    api_key="...", api_secret="...",
    link_uid="...", region="us-west-2", user_uid="...",
)

# Typed helper — blocks until the operation reaches a terminal status.
result = client.create_cluster(
    "prod",
    type="Managed",
    max_ocu=10,
    min_ocu=1,
    options={"worker.type": "oh-general-4"},
)
print(result.api_status)        # ApiStatus.SUCCESS
print(result.api_response)      # raw API payload
```

### Credentials

Credentials resolve from three sources, in order (highest wins):

1. **Explicit constructor arguments** (shown above).
2. **Environment variables** — `ONEHOUSE_ACCOUNT_UID`, `ONEHOUSE_PROJECT_UID`, `ONEHOUSE_API_KEY`, `ONEHOUSE_API_SECRET`, `ONEHOUSE_LINK_UID`, `ONEHOUSE_REGION`, `ONEHOUSE_USER_UID`. Optional: `ONEHOUSE_BASE_URL`, `ONEHOUSE_PROFILE`, `ONEHOUSE_CREDENTIALS_FILE`.
3. **INI credentials file** at `~/.onehouse/credentials` (override with `ONEHOUSE_CREDENTIALS_FILE`).

```ini
# ~/.onehouse/credentials
[default]
account_uid = 92e5f1ab-...
project_uid = 3afe72cd-...
api_key     = j+m8wRhgpKYFTLxCHNDzQA==
api_secret  = tXpzrqfUBNK9yhS5+FmLM37xwfhVeZygJntCzHG4Dpq=
link_uid    = da56fe8b-...
region      = us-west-2
user_uid    = ...

[staging]
account_uid = ...
```

```python
# Read from environment / default profile.
client = OnehouseResources()

# Pick a named profile.
client = OnehouseResources(profile="staging")
```

Missing fields produce an `AuthError` listing which fields are unset and how to supply them. A world-readable credentials file triggers a warning — `chmod 600 ~/.onehouse/credentials`.

### Three ways to run a command

```python
# (1) Blocking — submit, poll, return the terminal status. Most common.
result = client.execute("SHOW CLUSTERS")

# (2) Non-blocking — submit now, poll later. Good for long-running ops or
# parallel orchestration where you don't want to hold a thread.
submitted = client.submit("CREATE CLUSTER `prod` TYPE = 'Managed' MAX_OCU = 10 MIN_OCU = 1")
# ... do other work, persist submitted.request_id ...
status = client.get_status(submitted.request_id)
while status.api_status == ApiStatus.PENDING:
    time.sleep(5)
    status = client.get_status(submitted.request_id)

# (3) Typed helpers — same blocking semantics as execute(), but build the SQL for you.
client.create_cluster("prod", type="Managed", max_ocu=10, min_ocu=1)
client.show_clusters()
client.delete_cluster("prod")
```

### Typed helpers

`OnehouseResources` exposes one method per Phase 1 SQL command (~50 methods across 11 resource families): Clusters, Lakes, Databases, Tables, Catalogs, Sources, Flows, Transformations, Validations, Table Services, Jobs, Service Principals, API Tokens. Every typed helper accepts the same trailing kwargs: `unsafe_raw`, `timeout`, `poll_interval`.

```python
client.create_lake(
    "analytics",
    lake_type="MANAGED",
    bucket_path="s3://my-bucket/lake",
    default_services_cluster="services",
)

from onehouse_python_sdk.resources.sql.commands import PartitionKeyField

client.create_flow(
    "events_pipeline",
    source="my_kafka_source",
    lake="analytics",
    database="events",
    table_name="page_views",
    write_mode="MUTABLE",
    cluster="ingest_cluster",
    catalogs=["my_glue_catalog"],
    record_key_fields=["id"],
    partition_key_fields=[
        PartitionKeyField("date", partition_type="DATE_STRING",
                          input_format="yyyy-mm-dd", output_format="yyyy-mm-dd"),
    ],
    min_sync_frequency_mins=5,
    options={"kafka.topic.name": "page_views"},
)
```

ACL / privilege / role / group commands aren't exposed as typed helpers yet — use `client.execute("GRANT ...")` until they land in a future release.

#### `unsafe_raw` escape hatch

The builder validates resource names against `^[A-Za-z][A-Za-z0-9_-]*$` and rejects unknown enum values to catch obvious typos. If a name doesn't match (e.g. legacy resources with dots in the name) or you're using a SQL feature the SDK hasn't been updated for, pass `unsafe_raw=True` on the call to skip validation:

```python
client.create_cluster("legacy.name", type="CustomType", unsafe_raw=True)
```

The argument is deliberately named so its uses are easy to find in `grep`.

### Error handling

```
OnehouseSdkError
└── ResourcesError                     # raised by the resources/ subpackage
    ├── AuthError                      # missing/invalid credentials
    ├── SqlParseError                  # HTTP 400 + grpc-message — server rejected the SQL
    ├── OperationFailedError           # terminal status FAILED or INVALID
    └── OperationTimeoutError          # polling exceeded the configured timeout
```

`OperationTimeoutError` carries the `request_id` of the in-flight operation — you can resume polling with `get_status(request_id)` rather than re-submitting (which would create a duplicate resource).

### Client API

| Method | Description |
|--------|-------------|
| `submit(statement) → SubmitResponse` | POST `/v1/resource/`, return the `requestId`. |
| `get_status(request_id) → StatusResponse` | GET `/v1/status/{id}`, return the parsed status. |
| `execute(statement, timeout=, poll_interval=) → StatusResponse` | Submit + poll until terminal. Blocks. |
| `<verb>_<resource>(...)` (~50 methods) | Typed wrappers around `execute()` that build SQL for you. |

### Notes

- **Lazy import** — `from onehouse_python_sdk import OnehouseResources` works even when the `[resources]` extra isn't installed; the first network call raises a clear "install `[resources]` extra" error.
- **Rate limit** — projects are capped at 10 QPS. The transport retries 429 responses with bounded exponential backoff.
- **Not a SqlClient** — `OnehouseResources` is a control-plane HTTP client and intentionally does not implement the `SqlClient` interface (no `cursor`, `fetchall`, etc.).
