Metadata-Version: 2.4
Name: jdbc4py
Version: 0.1.0
Summary: A JDBC-first Python database bridge powered by JPype
Project-URL: Homepage, https://github.com/ryuk-me/jdbc4py
Project-URL: Repository, https://github.com/ryuk-me/jdbc4py
Project-URL: Documentation, https://github.com/ryuk-me/jdbc4py/tree/main/docs
Project-URL: Issues, https://github.com/ryuk-me/jdbc4py/issues
Project-URL: Changelog, https://github.com/ryuk-me/jdbc4py/blob/main/CHANGELOG.md
Author: Neeraj Kumar
License: MIT
License-File: LICENSE
Keywords: database,dbapi,jdbc,jpype,postgresql
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Requires-Python: <3.15,>=3.11
Requires-Dist: jpype1>=1.7
Provides-Extra: arrow
Requires-Dist: pyarrow>=24.0; extra == 'arrow'
Provides-Extra: benchmark
Requires-Dist: celery[redis]>=5.6; extra == 'benchmark'
Requires-Dist: clickhouse-connect>=1.3; (python_version < '3.15') and extra == 'benchmark'
Requires-Dist: databricks-sql-connector>=4.0; extra == 'benchmark'
Requires-Dist: db-dtypes>=1.7; extra == 'benchmark'
Requires-Dist: google-cloud-bigquery>=3.42; extra == 'benchmark'
Requires-Dist: google-cloud-spanner>=3.68; extra == 'benchmark'
Requires-Dist: hdbcli>=2.29; extra == 'benchmark'
Requires-Dist: mssql-python>=1.9; extra == 'benchmark'
Requires-Dist: mysql-connector-python>=9.7; extra == 'benchmark'
Requires-Dist: oracledb>=4.0; extra == 'benchmark'
Requires-Dist: pandas>=3.0; extra == 'benchmark'
Requires-Dist: presto-python-client>=0.8; extra == 'benchmark'
Requires-Dist: psycopg>=3.3; extra == 'benchmark'
Requires-Dist: pyarrow>=24.0; extra == 'benchmark'
Requires-Dist: pyhive[hive-pure-sasl]>=0.7.0; extra == 'benchmark'
Requires-Dist: pyodbc>=5.3; extra == 'benchmark'
Requires-Dist: snowflake-connector-python>=4.6; extra == 'benchmark'
Requires-Dist: sqlalchemy>=2.0; extra == 'benchmark'
Requires-Dist: thrift-sasl>=0.4.3; extra == 'benchmark'
Requires-Dist: thrift<0.23,>=0.22; extra == 'benchmark'
Requires-Dist: trino>=0.337; extra == 'benchmark'
Provides-Extra: dataframe
Requires-Dist: duckdb>=1.5; extra == 'dataframe'
Requires-Dist: pandas>=3.0; extra == 'dataframe'
Requires-Dist: polars>=1.41; extra == 'dataframe'
Requires-Dist: pyarrow>=24.0; extra == 'dataframe'
Provides-Extra: dev
Requires-Dist: duckdb>=1.5; extra == 'dev'
Requires-Dist: pandas>=3.0; extra == 'dev'
Requires-Dist: polars>=1.41; extra == 'dev'
Requires-Dist: pre-commit>=3.5; extra == 'dev'
Requires-Dist: pyarrow>=24.0; extra == 'dev'
Requires-Dist: pytest>=9.1; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Provides-Extra: sqlalchemy
Requires-Dist: sqlalchemy>=2.0; extra == 'sqlalchemy'
Description-Content-Type: text/markdown

# jdbc4py

Python access to databases whose native drivers are weak, commercial, or missing — and a fast Arrow bulk-read path across any database with a JDBC driver.

jdbc4py embeds a JVM in-process via JPype, loads your JDBC driver JAR, and hands back a standard DBAPI 2.0 connection — plus near-zero-copy Arrow reads into pandas, Polars, and DuckDB.

---

## When to use this

**Reach for jdbc4py when:**

- Your database's native Python driver is weak, commercial, or nonexistent — **SAP HANA, Sybase ASE/IQ**, or any JDBC-only source.
- You need to **join or query across two different database systems** in one SQL statement (e.g. PostgreSQL ⋈ MySQL) — see [cross-database queries](#cross-database-queries).
- You query **many different databases** from one codebase and want one API (and one fast Arrow path) instead of a different driver per engine.
- You need **fast bulk reads into pandas / Arrow / Polars** from a JDBC database.

**Use something else when:**

- You're on **PostgreSQL, MySQL, or SQLite** — the native drivers (psycopg, etc.) are faster and need no JVM. Use them.
- You need **sub-second connect latency** (serverless, high-frequency OLTP) — jdbc4py runs a JVM in-process, so the first connection pays ~0.5–1 s of JVM startup (use the pool to amortize it).
- You can't run a JVM at all.

### vs. the alternatives

- **Native driver** (psycopg, hdbcli, …): faster for single-database, row-by-row access and connect latency. jdbc4py wins on bulk Arrow reads and on databases with no good native driver.
- **JayDeBeApi**: the other JPype-based JDBC bridge. jdbc4py adds the Arrow bulk path, a connection pool, richer type mapping, a SQLAlchemy dialect, and an asyncio facade.
- **ADBC** (Arrow Database Connectivity): the emerging Arrow-native standard, no JVM — the better choice if your database has an ADBC driver. jdbc4py covers the long tail of databases that only ship JDBC.

---

## Features

- **Arrow bulk reads** — results serialize on the JVM side and arrive as a near-zero-copy Arrow table; pandas/Polars/DuckDB reads beat native driver + SQLAlchemy + pandas by **1.8–4.8×** on large analytical workloads (see [BENCHMARKS.md](BENCHMARKS.md)).
- **Broad database reach** — one API over any JDBC driver, including enterprise databases with thin Python support. See [Supported Databases](#supported-databases).
- **Cross-database queries** — register several connections and run a single SQL query that joins tables across *different* database systems; jdbc4py fetches each source via Arrow and executes the join in an in-memory DuckDB.
- **DBAPI 2.0 + SQLAlchemy** — a standard connection/cursor, plus a `postgresql+jdbc4py://…` dialect so SQLAlchemy, pandas, and ORMs work unchanged.
- **Write paths** — insert Arrow, pandas, Polars, or DuckDB data directly into any JDBC table.
- **Server-side streaming** — `stream=True` for huge results: instant time-to-first-batch, bounded memory.
- **Connection pool** — idle eviction, broken-connection replacement, configurable timeouts.
- **Rich type mapping** — numeric/decimal, temporal (nanosecond on the Arrow path), JSON, UUID, arrays, ranges, intervals, **vectors/embeddings**, and full Unicode.
- **asyncio** — `jdbc4py.aio` runs blocking JDBC calls off the event loop.
- **Operational niceties** — cancel a running query from any thread, uniform metadata helpers, and credential redaction in exceptions and logs.

---

## Requirements

- Python 3.11+
- Java 11+ (JRE is enough; JDK only needed if building the helper JAR)

---

## Installation

```bash
pip install jdbc4py                 # core DBAPI 2.0 (row-by-row fetch)
pip install "jdbc4py[dataframe]"    # + Arrow bulk reads into pandas / Polars / DuckDB
```

Requires a **Java 11+ runtime** (a JRE is enough — no JDK needed). The wheel
bundles the prebuilt Arrow helper JAR, so there's no build step at install time.
You still bring your own JDBC driver JAR for each database — see
[docs/installation.md](docs/installation.md) for driver placement and
environment configuration, and [Building from Source](#building-from-source) to
work on jdbc4py itself.

---

## Quick Start

```python
from jdbc4py import connect

with connect(
    engine="postgres",
    host="localhost",
    database="mydb",
    user="me",
    password="secret",
) as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT id, name FROM users WHERE active = ?", [True])
        rows = cur.fetchall()
```

Bulk read into a DataFrame:

```python
df = conn.read_sql("SELECT * FROM events", format="pandas")
# or "polars", "arrow", "duckdb"
```

Write a DataFrame back:

```python
conn.write_polars(df, "staging", if_exists="replace")
```

Stream a very large result without buffering it client-side:

```python
df = conn.read_sql("SELECT * FROM huge_table", format="pandas", stream=True)
```

Use it through SQLAlchemy (and therefore pandas, ORMs, Alembic):

```python
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine("postgresql+jdbc4py://me:secret@localhost:5432/mydb")
df = pd.read_sql("SELECT * FROM events", engine)
```

### Cross-database queries

Join tables that live in **different database systems** in a single query.
Register each connection under an alias, then reference it as `alias.table`;
jdbc4py pulls each source over Arrow and runs the join in an in-memory DuckDB:

```python
from jdbc4py import connect, FederatedSession

with FederatedSession() as fed:
    fed.add_connection("pg", connect(engine="postgres", host="pg-host",
                                     database="analytics", user="me", password="secret"))
    fed.add_connection("mysql", connect(engine="mysql", host="mysql-host",
                                        database="users", user="me", password="secret"))

    df = fed.execute(
        "SELECT c.name, SUM(o.total) AS spend "
        "FROM pg.customers c "
        "JOIN mysql.orders o ON c.id = o.customer_id "
        "GROUP BY c.name",
        format="pandas",
    )
```

---

## Supported Databases

| Database | Engine key | Notes |
| --- | --- | --- |
| PostgreSQL | `postgres` | Full type coverage including arrays, ranges, hstore |
| MySQL | `mysql` | |
| MariaDB | `mariadb` | |
| SQL Server | `mssql` | |
| Oracle | `oracle` | |
| ClickHouse | `clickhouse` | |
| Trino | `trino` | |
| Presto | `presto` | |
| Snowflake | `snowflake` | `connect_snowflake()` helper |
| Databricks | `databricks` | `connect_databricks()` helper |
| Google Cloud Spanner | `spanner` | `connect_spanner()` helper |
| Google BigQuery | `bigquery` | `connect_bigquery()` helper |
| Sybase ASE / SAP IQ | `sybase_ase` / `sybase_iq` | jConnect or jTDS driver |
| SAP HANA | `hana` | on-prem & HANA Cloud; `ngdbc.jar` driver |
| SQLite | `sqlite` | Useful for testing |

Any database with a JDBC driver works via the `jdbc_url=` + `driver=` + `jar=` path.

---

## Connection Pool

```python
from jdbc4py import ConnectionPool

pool = ConnectionPool(
    engine="postgres",
    host="localhost",
    database="mydb",
    user="me",
    password="secret",
    min_size=2,
    max_connections=10,
)

with pool.connect() as conn:
    df = conn.read_sql("SELECT * FROM orders", format="pandas")
```

---

## Benchmarks

On a 50K-row × 50-col read, jdbc4py's Arrow path is the fastest pandas loader on
every local engine tested, beating the native driver + SQLAlchemy + pandas by
**~1.8×** on PostgreSQL, **2.3×** MySQL, **3.2×** MariaDB, **3.3×** Oracle, and
**4.8×** on SQL Server. On Databricks it's **2.8×** faster than
databricks-sql-connector; on Snowflake the native connector wins (it reads
Snowflake's native Arrow wire format directly).

The win is the **Arrow bulk path**, not row access: row-by-row `fetchall()` goes through the JVM and is **slower** than a native C driver (it creates Python objects one cell at a time), and the first connection pays JVM startup. Rule of thumb — use `read_sql(format="pandas"/"arrow")` for large reads, pool your connections, and reach for a native driver only if you need fast per-row tuple access.

Full numbers across PostgreSQL/MySQL/MariaDB/SQL Server/ClickHouse/Oracle plus deployed SAP HANA, Snowflake, and Databricks: [BENCHMARKS.md](BENCHMARKS.md)

---

## Documentation

| Document | Contents |
| --- | --- |
| [docs/installation.md](docs/installation.md) | Java setup, driver JARs, environment variables |
| [docs/architecture.md](docs/architecture.md) | How jdbc4py works internally |
| [docs/performance.md](docs/performance.md) | Arrow path, batching, pool tuning |
| [docs/driver-management.md](docs/driver-management.md) | Driver discovery, custom JARs, `JDBC4PY_DRIVER_PATHS` |
| [docs/jdbc-internals.md](docs/jdbc-internals.md) | Cursor, type system, cancellation, pool internals |
| [docs/design-decisions.md](docs/design-decisions.md) | Why JPype, why Arrow IPC, tradeoffs |
| [docs/troubleshooting.md](docs/troubleshooting.md) | Common errors and fixes |
| [docs/faq.md](docs/faq.md) | Frequently asked questions |
| [docs/operator-guide.md](docs/operator-guide.md) | Production deployment and security |

---

## Building from Source

```bash
git clone https://github.com/ryuk-me/jdbc4py
cd jdbc4py
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,dataframe]"

# Build the Java helper JAR (enables Arrow bulk reads)
./java/jdbc4py-helper/build.sh

# Unit tests
pytest -q

# Integration tests (requires Docker)
docker compose --profile core up -d
JDBC4PY_RUN_INTEGRATION=1 pytest tests/integration/ -q
```

---

## Contributing

Bug reports and pull requests are welcome. For large changes, open an issue first to discuss the approach. Tests are required for new features.

---

## License

MIT
