Metadata-Version: 2.4
Name: JayDeBeApiArrow
Version: 2.1.3
Summary: Use JDBC database drivers from Python 3 with a DB-API, accelerated with Apache Arrow.
Author-email: HenryNebula <henrynebula0710@gmail.com>
License: LGPL-3.0-or-later
Project-URL: Homepage, https://github.com/HenryNebula/jaydebeapiarrow
Keywords: db,api,java,jdbc,bridge,connect,sql,jpype,apache-arrow
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
Classifier: Programming Language :: Java
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Java Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: COPYING
License-File: COPYING.LESSER
Requires-Dist: JPype1>=1.0.0
Requires-Dist: pyarrow>=16.0.0
Requires-Dist: numpy
Requires-Dist: cffi
Requires-Dist: bump-my-version
Dynamic: license-file

# JayDeBeApiArrow - High-Performance JDBC to Python DB-API Bridge

[![Test Status](https://github.com/HenryNebula/jaydebeapiarrow/actions/workflows/tests.yml/badge.svg)](https://github.com/HenryNebula/jaydebeapiarrow/actions/workflows/tests.yml)
[![PyPI version](https://img.shields.io/pypi/v/JayDeBeApiArrow.svg)](https://pypi.python.org/pypi/JayDeBeApiArrow/)

The **JayDeBeApiArrow** module allows you to connect from Python code to databases using Java [JDBC](http://java.sun.com/products/jdbc/overview.html). It provides a Python [DB-API v2.0](http://www.python.org/dev/peps/pep-0249/) to that database.

> **Note:** This is a fork of the original [JayDeBeApi](https://github.com/baztian/jaydebeapi) project.

## Key Differences in this Fork

1.  **High Performance with Apache Arrow:**
    The primary goal of this fork is to significantly improve data fetch performance. Instead of iterating through JDBC ResultSets row-by-row in Python (which has high overhead), this library uses a custom Java extension (`arrow-jdbc-extension`) to convert JDBC data into **Apache Arrow** record batches directly within the JVM. These batches are then efficiently transferred to Python.

2.  **Modernization:**
    *   **Python 3 Only:** Support for Python 2 has been removed.
    *   **JPype Only:** Support for Jython has been removed to focus on the CPython + JPype architecture.
    *   **Strict Typing:** Enforces stricter typing for Decimal and temporal types.

It works on ordinary Python (cPython) using the [JPype](https://pypi.python.org/pypi/JPype1/) Java integration.

## Install

You can get and install JayDeBeApiArrow with pip:

```bash
pip install JayDeBeApiArrow
```

Or you can get a copy of the source by cloning from the [JayDeBeApiArrow github project](https://github.com/HenryNebula/jaydebeapiArrow) and install with:

```bash
uv sync
```

Ensure that you have installed [JPype](https://pypi.python.org/pypi/JPype1/) properly (it will be installed automatically by `uv sync`).

## Usage

Basically you just import the `jaydebeapiarrow` Python module and execute the `connect` method. This gives you a DB-API conform connection to the database.

The first argument to `connect` is the name of the Java driver class. The second argument is a string with the JDBC connection URL. Third you can optionally supply a sequence consisting of user and password or alternatively a dictionary containing arguments that are internally passed as properties to the Java `DriverManager.getConnection` method. See the Javadoc of `DriverManager` class for details.

The next parameter to `connect` is optional as well and specifies the jar-Files of the driver if your classpath isn't set up sufficiently yet. The classpath set in `CLASSPATH` environment variable will be honored.

Here is an example:

```python
import jaydebeapiarrow
conn = jaydebeapiarrow.connect(
    "org.hsqldb.jdbcDriver",
    "jdbc:hsqldb:mem:.",
    ["SA", ""],
    "/path/to/hsqldb.jar"
)
curs = conn.cursor()
curs.execute('create table CUSTOMER'
             '("CUST_ID" INTEGER not null,'
             ' "NAME" VARCHAR(50) not null,'
             ' primary key ("CUST_ID"))')
curs.execute("insert into CUSTOMER values (?, ?)", (1, 'John'))
curs.execute("select * from CUSTOMER")
print(curs.fetchall())
# Output: [(1, 'John')]
curs.close()
conn.close()
```

If you're having trouble getting this work check if your `JAVA_HOME` environment variable is set correctly. For example:

```bash
JAVA_HOME=/usr/lib/jvm/java-8-openjdk python
```

An alternative way to establish connection using connection properties:

```python
conn = jaydebeapiarrow.connect(
    "org.hsqldb.jdbcDriver",
    "jdbc:hsqldb:mem:.",
    {
        'user': "SA", 'password': "",
        'other_property': "foobar"
    },
    "/path/to/hsqldb.jar"
)
```

Also using the `with` statement might be handy:

```python
with jaydebeapiarrow.connect(
    "org.hsqldb.jdbcDriver",
    "jdbc:hsqldb:mem:.",
    ["SA", ""],
    "/path/to/hsqldb.jar"
) as conn:
    with conn.cursor() as curs:
        curs.execute("select count(*) from CUSTOMER")
        print(curs.fetchall())
        # Output: [(1,)]
```

## Supported Databases

In theory *every database with a suitable JDBC driver should work*. It is confirmed to work with the following databases:

*   SQLite
*   Hypersonic SQL (HSQLDB)
*   IBM DB2
*   IBM DB2 for mainframes
*   Oracle
*   Teradata DB
*   Netezza
*   Mimer DB
*   Microsoft SQL Server
*   MySQL
*   PostgreSQL
*   ...and many more.

## Testing

Integration tests are located in `test/`. The test suite covers SQLite (in-memory), PostgreSQL, MySQL, and HSQLDB.

### Build JARs and download drivers

```bash
uv run bash test/build.sh                 # Build arrow-jdbc-extension and MockDriver JARs
uv run bash test/download_jdbc_drivers.sh # Download PostgreSQL, MySQL, SQLite, HSQLDB JDBC drivers
```

### Run tests

```bash
CLASSPATH="test/jars/*" uv run python -m unittest test.test_integration.HsqldbTest   # HSQLDB
CLASSPATH="test/jars/*" uv run python -m unittest test.test_integration.SqliteXerialTest  # SQLite
CLASSPATH="test/jars/*" uv run python -m unittest test.test_mock                       # Mock driver
```

### External database tests

PostgreSQL and MySQL tests require running database instances. Docker Compose configs and helper scripts are provided in `test/`:

```bash
# Start both databases
bash test/start.sh

# Check status
bash test/status.sh

# Stop databases
bash test/stop.sh
```

Database connection defaults (overridable via environment variables):

| Database | Host | Port | DB | User | Password | Env prefix |
|---|---|---|---|---|---|---|
| PostgreSQL | localhost | 5432 | test_db | user | password | `JY_PG_*` |
| MySQL | localhost | 3306 | test_db | user | password | `JY_MYSQL_*` |

## Benchmarks

This approach was inspired by [Uwe Korn's work on pyarrow.jvm](https://uwekorn.com/2019/11/17/fast-jdbc-access-in-python-using-pyarrow-jvm.html) (Apache Drill) and [Razvi Noorul's Trino benchmarks](https://medium.com/@noorulrazvi/trino-jdbc-access-in-python-using-pyarrow-jvm-d1b75fe039ee), both demonstrating 100x+ speedups by using Arrow to bypass JPype's row-by-row serialization.

Our benchmarks (local PostgreSQL, 5M rows, 4 columns) show a **~20x speedup** over plain jaydebeapi. The difference in multiplier is due to methodology: both posts tested against distributed query engines (Drill, Trino) over network connections, which have much higher per-row JDBC overhead. PostgreSQL's JDBC driver is significantly faster at row retrieval, so the baseline is lower and there's less headroom for a multiplier. The absolute Arrow throughput is comparable across all three.

| Method | 5M rows | Throughput | vs jaydebeapi |
|---|---|---|---|
| jaydebeapi (baseline) | 198.66s | 25K rows/s | — |
| Drop-in replacement | 25.82s | 194K rows/s | 7.7x |
| Native Arrow API | 9.38s | 542K rows/s | **21.2x** |
| Psycopg2 (native driver) | 7.34s | 682K rows/s | 27x |

See `benchmark/` for scripts to reproduce these results.

## Contributing

Please submit bugs and patches to the [JayDeBeApiArrow issue tracker](https://github.com/HenryNebula/jaydebeapiArrow/issues). All contributors will be acknowledged. Thanks!

## License

JayDeBeApiArrow is released under the GNU Lesser General Public license (LGPL). See the file `COPYING` and `COPYING.LESSER` in the distribution for details.
