Metadata-Version: 2.4
Name: h2o-connector-service
Version: 0.1.0.dev7001
Summary: Python client SDK for the H2O Connector Service — create connectors, open connections, and stream extracted data
Author-email: "H2O.ai, Inc." <support@h2o.ai>
Project-URL: Source, https://github.com/h2oai/connector-service
Project-URL: Issues, https://github.com/h2oai/connector-service/issues
Keywords: connector,data-extraction,data-ingestion,gRPC,postgresql,snowflake,hive,delta-lake,blob-storage,h2o
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: License :: Other/Proprietary License
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.27
Requires-Dist: grpcio>=1.64
Requires-Dist: protobuf>=4.25
Requires-Dist: googleapis-common-protos>=1.72
Requires-Dist: h2o-cloud-discovery>=3.3.0
Requires-Dist: h2o-authn>=3.1.0
Provides-Extra: pandas
Requires-Dist: pandas>=1.5; extra == "pandas"
Provides-Extra: parquet
Requires-Dist: pyarrow>=14; extra == "parquet"
Provides-Extra: datatable
Requires-Dist: datatable>=1.0; extra == "datatable"
Provides-Extra: h2o
Requires-Dist: pandas>=1.5; extra == "h2o"
Requires-Dist: h2o>=3.44; extra == "h2o"
Requires-Dist: pyarrow>=14; extra == "h2o"
Provides-Extra: test
Requires-Dist: pytest<9,>=8.0; extra == "test"
Requires-Dist: requests>=2.31; extra == "test"
Provides-Extra: dev
Requires-Dist: pytest<9,>=8.0; extra == "dev"
Requires-Dist: requests>=2.31; extra == "dev"
Requires-Dist: grpcio-tools>=1.64; extra == "dev"

# h2o-connector-service

- Python client: https://pypi.org/project/h2o-connector-service/
- Source: https://github.com/h2oai/connector-service

Python client SDK for the H2O Connector Service. Provides a high-level API to create connectors, open connections, and stream extracted data from supported data sources (PostgreSQL, Snowflake, Hive, Delta Lake, Blob Storage, and more).

```bash
pip install h2o-connector-service
```

## Quick Start (H2O Cloud Discovery)

The recommended way to connect when running on H2O AI Cloud:

```py
from h2o_connector_service import ConnectorService

with ConnectorService.from_discovery("https://cloud.h2o.ai", "my-workspace") as svc:
    with svc.open_session("CONNECTOR_TYPE_POSTGRESQL", {
        "host": "db.example.com",
        "port": "5432",
        "database": "mydb",
        "username": "user",
        "password": "pass",
    }) as session:
        # Stream rows one-by-one (constant memory)
        for row in session.stream_records():
            print(row)
```

## Quick Start (Manual / Legacy)

For direct connections without H2O Cloud Discovery (deprecated):

```py
from h2o_connector_service import ConnectorService

with ConnectorService("http://localhost:8080", "<your-oidc-token>", "my-workspace") as svc:
    with svc.open_session("CONNECTOR_TYPE_POSTGRESQL", {
        "host": "db.example.com",
        "port": "5432",
        "database": "mydb",
        "username": "user",
        "password": "pass",
    }) as session:
        for row in session.stream_records():
            print(row)
```

## Output Formats

Once you have a session, stream data into various formats:

```py
# CSV file (memory-safe — rows written as they arrive)
session.stream_to_csv("output.csv")

# pandas DataFrame (requires: pip install h2o-connector-service[pandas])
df = session.stream_to_pandas()

# Parquet file (memory-safe, chunked row groups)
# requires: pip install h2o-connector-service[parquet]
session.stream_to_parquet("output.parquet")

# datatable Frame (memory-safe, chunked rbind)
# requires: pip install h2o-connector-service[datatable]
frame = session.stream_to_data_table()

# H2O Frame (requires running H2O cluster + h2o.init())
# requires: pip install h2o-connector-service[h2o]
h2o_frame = session.stream_to_h2o_frame()

# Collect all rows into a list of dicts
records = session.stream_to_records()
```

## Advanced Usage

For full control over the connector lifecycle, use the individual service clients:

```py
from h2o_connector_service import (
    Client,
    ConnectorServiceClient,
    ConnectionServiceClient,
    ConnectorSession,
)

with Client.from_discovery("https://cloud.h2o.ai", "my-workspace") as client:
    connector_svc = ConnectorServiceClient(client)
    conn_svc = ConnectionServiceClient(client)

    # 1. Create a connector
    connector_svc.create_connector("my-workspace", {
        "metadata": {"name": "my-pg", "workspace_id": "my-workspace"},
        "spec": {
            "connector_type": "CONNECTOR_TYPE_POSTGRESQL",
            "config": {"host": "db.example.com", "port": "5432", "database": "mydb"},
        },
    })

    # 2. Create a connection
    connection = conn_svc.create_connection("my-workspace", {
        "metadata": {"workspace_id": "my-workspace"},
        "spec": {"connector_name": "workspaces/my-workspace/connectors/my-pg"},
    })

    # 3. Wait for the worker pod and stream data
    session = ConnectorSession(client, "my-workspace", connection["connection_id"])
    session.wait_for_worker_ready(timeout=300)
    session.stream_to_csv("output.csv")
```

## Optional Dependencies

Install extras for additional output format support:

```bash
pip install h2o-connector-service[pandas]      # pandas DataFrames
pip install h2o-connector-service[parquet]      # Parquet files (pyarrow)
pip install h2o-connector-service[datatable]    # datatable Frames
pip install h2o-connector-service[h2o]          # H2O Frames (pandas + pyarrow + h2o)
```

## Supported Connector Types

| Connector Type | Worker |
|---|---|
| `CONNECTOR_TYPE_POSTGRESQL` | worker-postgresql (Java/JDBC) |
| `CONNECTOR_TYPE_SNOWFLAKE` | worker-snowflake (Go) |
| `CONNECTOR_TYPE_HIVE` | worker-hive (Java/JDBC) |
| `CONNECTOR_TYPE_DELTA_LAKE` | worker-delta (Rust) |
| `CONNECTOR_TYPE_S3` | worker-blob (Go) |
| `CONNECTOR_TYPE_AZURE_BLOB` | worker-blob (Go) |
| `CONNECTOR_TYPE_GCS` | worker-blob (Go) |
