Metadata-Version: 2.4
Name: ns-sqlalchemy
Version: 0.0.2
Summary: SQLAlchemy dialects for multiple databases
Author-email: navysummer <navysummer@yeah.net>
License: MIT
Keywords: SQLAlchemy,database,dialect,DM,Dameng,ClickHouse
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: httpx>=0.25.0
Provides-Extra: dameng
Requires-Dist: dmpython; sys_platform != "darwin" and extra == "dameng"
Requires-Dist: ns-pydm; sys_platform == "darwin" and extra == "dameng"
Dynamic: license-file

# ns-sqlalchemy

SQLAlchemy dialects for multiple databases.

## Overview

ns-sqlalchemy is a Python package that provides SQLAlchemy dialect implementations for various databases. Currently, it supports:

### Cloud Warehouses
- **ClickHouse**: Full dialect support with HTTP, Native, and asynch drivers
- **Amazon Redshift**: Synchronous driver via `redshift_connector`
- **Google BigQuery**: Synchronous driver via `google.cloud.bigquery`
- **Snowflake**: Synchronous driver via `snowflake.connector`
- **Firebolt**: Synchronous driver via `firebolt.db`
- **Databricks**: Synchronous driver via `databricks.sql`
- **Amazon Athena**: Synchronous driver via `pyathena`

### MySQL-Compatible
- **TiDB**: MySQL protocol via `pymysql` (port 4000)
- **OceanBase**: MySQL protocol via `pymysql` (port 2883)

### PostgreSQL-Compatible
- **CockroachDB**: PostgreSQL protocol via `psycopg2` (port 26257)

### Big Data / Analytics
- **Apache Druid**: Synchronous driver via `pydruid`
- **Apache Hive**: Synchronous driver via `pyhive.hive`
- **Presto**: Synchronous driver via `pyhive.presto`

### Enterprise Relational
- **IBM DB2**: Synchronous driver via `ibm_db_dbi`
- **SAP Hana**: Synchronous driver via `hdbcli.dbapi`
- **Teradata**: Synchronous driver via `teradatasql`
- **Firebird**: Synchronous driver via `fdb`
- **Microsoft Access**: Synchronous driver via `pyodbc`
- **CrateDB**: Synchronous driver via `crate.client`

### NoSQL / Other
- **MongoDB**: Read-only dialect via `mongodb-sqlalchemy`
- **Elasticsearch**: Read-only dialect via `elasticsearch-dbapi`
- **Google Sheets**: Read-only dialect via `shillelagh`
- **Dameng (DM)**: Full dialect support with dmPython and dmAsync drivers

## Features

### ClickHouse
- **Multiple Drivers**: HTTP (`clickhouse+http://`), Native (`clickhouse+native://`), asynch (`clickhouse+asynch://`)
- **Full Type Support**: Int8–UInt256, Float32/64, Decimal, Date/Date32, DateTime/DateTime64, Enum8/16, Array, Nullable, LowCardinality, Tuple, Map, Nested, IPv4/IPv6, UUID, AggregateFunction, SimpleAggregateFunction
- **DDL Support**: MergeTree-family engines (PARTITION BY, ORDER BY, PRIMARY KEY, SAMPLE BY, TTL, SETTINGS), Replicated engines, Distributed, Buffer, MaterializedView, column CODEC/MATERIALIZED/ALIAS/AFTER
- **SQL Extensions**: FINAL, SAMPLE, LIMIT BY, ARRAY JOIN, WITH CUBE/ROLLUP/TOTALS, JOIN strictness/distribution, ALTER TABLE DELETE/UPDATE/ADD/DROP/MODIFY/RENAME
- **Alembic Integration**: Full migration support including Materialized Views
- **ORM Support**: Custom Query class with ClickHouse-specific methods, declarative base integration
- **Reflection**: Inspect tables, views, materialized views

### Dameng
- **Synchronous Driver**: `dmPython` driver for synchronous operations
- **Asynchronous Driver**: `dmAsync` driver for async operations
- **Advanced Type Support**: NUMBER, VARCHAR2, NVARCHAR2, CHAR, DATE, DATETIME, TIMESTAMP, INTERVAL, BLOB, CLOB, NCLOB, BFILE
- **Vector Search**: Built-in support for vector similarity search with IVF and HNSW indexes
- **JSON Support**: JSON data type with path indexing
- **Compatibility Modes**: DM, MySQL, TSQL, and Oracle compatibility modes

## Installation

```bash
# Base install (supports ClickHouse HTTP driver)
pip install ns-sqlalchemy

# With Dameng support
pip install ns-sqlalchemy[dameng]

# Install optional database drivers
pip install pymysql          # TiDB, OceanBase
pip install psycopg2-binary  # CockroachDB
pip install redshift-connector  # Redshift
pip install google-cloud-bigquery  # BigQuery
pip install snowflake-connector-python  # Snowflake
pip install firebolt-sdk     # Firebolt
pip install databricks-sql-connector  # Databricks
pip install pydruid          # Druid
pip install pyhive           # Hive, Presto
pip install ibm-db           # DB2
pip install hdbcli           # SAP Hana
pip install pyodbc           # Access
pip install teradatasql      # Teradata
pip install pyathena         # Athena
pip install crate            # CrateDB
pip install fdb              # Firebird
pip install shillelagh       # Google Sheets
pip install elasticsearch-dbapi  # Elasticsearch
```

## Usage

### Basic Usage

```python
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base, Session
from ns_sqlalchemy.dameng import dialect

# Create engine with Dameng dialect
engine = create_engine(
    "dm+dmPython://user:password@host:port/database",
    dialect_options={"driver": "dmPython"}
)

# Create declarative base
Base = declarative_base()

# Define a model
class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    name = Column(String(100))
    email = Column(String(255))

# Create tables
Base.metadata.create_all(engine)

# Use with session
with Session(engine) as session:
    # Insert data
    new_user = User(name="John Doe", email="john@example.com")
    session.add(new_user)
    session.commit()

    # Query data
    users = session.query(User).filter(User.name == "John Doe").all()
```

### Vector Search

```python
from ns_sqlalchemy.dameng import VectorWordSeek

# Create vector search instance
vector_search = VectorWordSeek(
    connection_str="dm+dmPython://user:password@host:port/database",
    table_name="documents",
    vector_dim=128,
    drop_if_existing=False,
)

# Insert vectors
ids = vector_search.insert(
    texts=["Hello world", "Goodbye world"],
    metadatas=[{"source": "test"}, {"source": "test"}]
)

# Query similar vectors
results = vector_search.query(
    DistanceMetric="COSINE",
    query_vector=[0.1, 0.2, 0.3, ...],
    count=5,
    filter={"source": "test"}
)
```

### ClickHouse Usage

```python
from sqlalchemy import create_engine, Column, Integer, String, Float
from sqlalchemy.orm import Session
from ns_sqlalchemy.clickhouse import engines, types
from ns_sqlalchemy.clickhouse.sql.schema import Table
from ns_sqlalchemy.clickhouse.sql.ddl import CreateTable

# Create engine with ClickHouse HTTP driver
engine = create_engine("clickhouse+http://user:password@localhost:8123/default")

# Define a table with ClickHouse-specific engine
table = Table(
    "events",
    MetaData(),
    Column("event_id", String, primary_key=True),
    Column("timestamp", DateTime),
    Column("value", Float32),
    engine=engines.MergeTree(
        order_by=func.tuple("event_id"),
        partition_by="toYYYYMM(timestamp)",
    ),
)

# Create table
CreateTable(table).execute(engine)

# Insert data with engine
with engine.connect() as conn:
    conn.execute(
        table.insert(),
        {"event_id": "abc", "timestamp": datetime.now(), "value": 1.5},
    )

# Query with ClickHouse-specific features
from sqlalchemy import select, func
from ns_sqlalchemy.clickhouse.sql.selectable import Select

s = (
    Select(table)
    .final()
    .sample(0.1)
    .limit_by(10, table.event_id)
)
```

### Async ClickHouse

```python
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession

engine = create_async_engine(
    "clickhouse+asynch://user:password@localhost:9000/default"
)
```

### Alembic Integration

```python
# In alembic/env.py:
from ns_sqlalchemy.clickhouse.alembic.dialect import (
    ClickHouseDialectImpl,
    patch_alembic_version,
    include_object,
)

# Register the ClickHouse dialect implementation
context.configure(
    connection=connection,
    target_metadata=target_metadata,
    include_object=include_object,
)

# For clustered ClickHouse:
# patch_alembic_version(context, cluster="my_cluster")
```

### Dameng Async Usage

```python
import asyncio
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import declarative_base

async def main():
    # Create async engine
    engine = create_async_engine(
        "dm+dmAsync://user:password@host:port/database",
        echo=True
    )

    # Create declarative base
    Base = declarative_base()

    # Define a model
    class Product(Base):
        __tablename__ = "products"
        id = Column(Integer, primary_key=True)
        name = Column(String(100))
        price = Column(Float)

    # Create tables
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)

    # Use with async session
    async with AsyncSession(engine) as session:
        # Insert data
        new_product = Product(name="Laptop", price=999.99)
        session.add(new_product)
        await session.commit()

        # Query data
        result = await session.execute(
            select(Product).filter(Product.price > 500)
        )
        products = result.scalars().all()

if __name__ == "__main__":
    asyncio.run(main())
```

## Supported Databases

### ClickHouse

- **URL Schemes**: `clickhouse://`, `clickhouse+http://`, `clickhouse+native://`, `clickhouse+asynch://`
- **Drivers**: HTTP (`httpx`), Native (`clickhouse-driver`), asynch (`clickhouse-driver` async)
- **Features**: Full type support, DDL (MergeTree-family engines, column CODEC), SQL extensions (FINAL, SAMPLE, ARRAY JOIN), Alembic migrations, ORM support

### TiDB

- **URL Scheme**: `tidb+pymysql://user:pass@host:4000/db`
- **Protocol**: MySQL (inherits MySQL DDL/SQL compilers)
- **Default Port**: 4000

### OceanBase

- **URL Scheme**: `oceanbase+pymysql://user:pass@host:2883/db`
- **Protocol**: MySQL (inherits MySQL DDL/SQL compilers)
- **Default Port**: 2883

### CockroachDB

- **URL Scheme**: `cockroachdb+psycopg2://user:pass@host:26257/db`
- **Protocol**: PostgreSQL (inherits PostgreSQL DDL/SQL compilers)
- **Default Port**: 26257

### Amazon Redshift

- **URL Scheme**: `redshift+redshift_connector://user:pass@host:5439/db`
- **Driver**: `redshift_connector`
- **Default Port**: 5439

### Google BigQuery

- **URL Scheme**: `bigquery://project/dataset`
- **Driver**: `google.cloud.bigquery`
- **Auth**: Uses Application Default Credentials or service account JSON

### Snowflake

- **URL Scheme**: `snowflake://user:pass@account/db`
- **Driver**: `snowflake.connector`

### Firebolt

- **URL Scheme**: `firebolt://user:pass@engine/db`
- **Driver**: `firebolt.db`

### Databricks

- **URL Scheme**: `databricks://token@workspace/catalog`
- **Driver**: `databricks.sql`

### Apache Druid

- **URL Scheme**: `druid://host:8082/datasource`
- **Driver**: `pydruid`

### Apache Hive

- **URL Scheme**: `hive://host:10000/default`
- **Driver**: `pyhive.hive`

### Presto

- **URL Scheme**: `presto://host:8080/catalog`
- **Driver**: `pyhive.presto`

### IBM DB2

- **URL Scheme**: `db2://user:pass@host:50000/db`
- **Driver**: `ibm_db_dbi`

### SAP Hana

- **URL Scheme**: `hana://user:pass@host:30015/db`
- **Driver**: `hdbcli.dbapi`

### Teradata

- **URL Scheme**: `teradata://user:pass@host/db`
- **Driver**: `teradatasql`

### Microsoft Access

- **URL Scheme**: `access:///path/to/db.accdb`
- **Driver**: `pyodbc`

### CrateDB

- **URL Scheme**: `cratedb://host:4200`
- **Driver**: `crate.client`

### Firebird

- **URL Scheme**: `firebird://user:pass@host:3050/db`
- **Driver**: `fdb`

### Amazon Athena

- **URL Scheme**: `athena://key:secret@region/schema?s3_staging_dir=s3://bucket/`
- **Driver**: `pyathena`

### MongoDB

- **URL Scheme**: `mongodb://host:27017/db`
- **Read-only**: `supports_alter=False`, no DDL support
- **Driver**: `mongodb-sqlalchemy`

### Elasticsearch

- **URL Scheme**: `elasticsearch://host:9200`
- **Read-only**: `supports_alter=False`, no DDL support
- **Driver**: `elasticsearch-dbapi`

### Google Sheets

- **URL Scheme**: `gsheets://`
- **Read-only**: `supports_alter=False`, no DDL support
- **Driver**: `shillelagh`

### Dameng (DM)

- **URL Schemes**: `dm+dmPython://user:pass@host:port/db`, `dm+dmAsync://user:pass@host:port/db`
- **Drivers**: `dmPython` (synchronous), `dmAsync` (asynchronous)
- **Compatibility Modes**: DM, MySQL, TSQL, Oracle
- **Features**: Advanced type support, vector similarity search (IVF/HNSW indexes), JSON support

## Type Support

### ClickHouse Types

| SQLAlchemy Type | ClickHouse Type | Notes |
|----------------|-----------------|-------|
| `types.Int8` | `Int8` | Signed 8-bit |
| `types.Int16` | `Int16` | Signed 16-bit |
| `types.Int32` | `Int32` | Signed 32-bit |
| `types.Int64` | `Int64` | Signed 64-bit |
| `types.Int128` | `Int128` | Signed 128-bit |
| `types.Int256` | `Int256` | Signed 256-bit |
| `types.UInt8` | `UInt8` | Unsigned 8-bit |
| `types.UInt16` | `UInt16` | Unsigned 16-bit |
| `types.UInt32` | `UInt32` | Unsigned 32-bit |
| `types.UInt64` | `UInt64` | Unsigned 64-bit |
| `types.Float32` | `Float32` | 32-bit float |
| `types.Float64` | `Float64` | 64-bit float |
| `types.String` | `String` | Variable-length string |
| `types.FixedString(n)` | `FixedString(n)` | Fixed-length string |
| `types.Date` | `Date` | Date |
| `types.Date32` | `Date32` | Extended range date |
| `types.DateTime` | `DateTime` | DateTime |
| `types.DateTime64(p)` | `DateTime64(p)` | DateTime with precision |
| `types.Decimal(p,s)` | `Decimal(p,s)` | Fixed-point number |
| `types.Boolean` | `Boolean` | Boolean |
| `types.Enum8` | `Enum8` | 8-bit enum |
| `types.Enum16` | `Enum16` | 16-bit enum |
| `types.UUID` | `UUID` | UUID |
| `types.IPv4` | `IPv4` | IPv4 address |
| `types.IPv6` | `IPv6` | IPv6 address |
| `types.Array(t)` | `Array(t)` | Array of type |
| `types.Nullable(t)` | `Nullable(t)` | Nullable wrapper |
| `types.LowCardinality(t)` | `LowCardinality(t)` | Low cardinality wrapper |
| `types.Tuple(...)` | `Tuple(...)` | Named tuple |
| `types.Map(k,v)` | `Map(k,v)` | Key-value map |
| `types.Nested(...)` | `Nested(...)` | Nested structure |
| `types.AggregateFunction(f,t)` | `AggregateFunction(f,t)` | Aggregate function type |
| `types.SimpleAggregateFunction(f,t)` | `SimpleAggregateFunction(f,t)` | Simple aggregate type |

### Dameng Scalar Types

| SQLAlchemy Type | Dameng Type | Notes |
|----------------|-------------|-------|
| `NUMBER` | NUMBER | Supports precision and scale |
| `VARCHAR2` | VARCHAR2 | Variable character |
| `NVARCHAR2` | NVARCHAR2 | National character varying |
| `CHAR` | CHAR | Fixed character |
| `DATE` | DATE | Date only |
| `DATETIME` | DATETIME | Date and time |
| `TIMESTAMP` | TIMESTAMP | Timestamp with timezone support |
| `INTERVAL` | INTERVAL | Time interval |
| `BLOB` | BLOB | Binary large object |
| `CLOB` | CLOB | Character large object |
| `NCLOB` | NCLOB | National character large object |
| `BFILE` | BFILE | Binary file |

### Vector Types

- `VECTOR(dim=128, format="FLOAT32")`: Vector type for similarity search
- Supports L1, L2, cosine, dot product, and Hamming distances

### JSON Types

- `JSON`: JSON data type
- `JSONIndexType`: JSON path index type
- `JSONPathType`: JSON path type

## Configuration

### Connection Parameters

```python
engine = create_engine(
    "dm+dmPython://user:password@host:port/database?charset=utf8",
    dialect_options={
        "driver": "dmPython",
        "auto_convert_lobs": True,
        "coerce_to_decimal": True,
        "arraysize": 50,
        "connection_timeout": 30,
    }
)
```

### Compatibility Modes

```python
# MySQL compatibility
engine = create_engine(
    "dm+dmPython://user:password@host:port/database?parse_type=MYSQL",
    dialect_options={"compatible_mode": "MYSQL"}
)

# TSQL compatibility
engine = create_engine(
    "dm+dmPython://user:password@host:port/database?parse_type=TSQL",
    dialect_options={"compatible_mode": "TSQL"}
)

# Oracle compatibility
engine = create_engine(
    "dm+dmPython://user:password@host:port/database",
    dialect_options={"compatible_mode": "ORACLE"}
)
```

## Vector Index Operations

### Create IVF Index

```python
from ns_sqlalchemy.dameng import VectorAdaptor

vector_adaptor = VectorAdaptor(engine)

# Create IVF index
vector_adaptor.create_vector_ivf_index(
    column=Product.embedding,
    metric_name="COSINE",
    percentage_value=90,
    num_of_partitions=100,
)
```

### Create HNSW Index

```python
# Create HNSW index
vector_adaptor.create_vector_hnsw_index(
    column=Product.embedding,
    metric_name="COSINE",
    percentage_value=90,
    max_connection=16,
    ef_construction=200,
)
```

## Testing

Run the tests with pytest:

```bash
pytest tests/
```

Or run specific test suites:

```bash
# ClickHouse tests
pytest tests/test_clickhouse/
pytest tests/test_clickhouse/test_sql.py
pytest tests/test_clickhouse/test_ddl.py

# Dameng tests
pytest tests/test_dameng/
pytest tests/test_dameng/test_dialect.py
pytest tests/test_dameng/test_vector.py

# Database dialect registration tests
pytest tests/test_databases/
pytest tests/test_databases/test_dialects.py
```

## Development

### Project Structure

```
ns-sqlalchemy/
├── src/
│   └── ns_sqlalchemy/
│       ├── __init__.py
│       ├── clickhouse/
│       │   ├── __init__.py
│       │   ├── alembic/
│       │   │   ├── comparators.py
│       │   │   ├── dialect.py
│       │   │   ├── operations.py
│       │   │   ├── renderers.py
│       │   │   └── toimpl.py
│       │   ├── drivers/
│       │   │   ├── base.py
│       │   │   ├── compilers/
│       │   │   │   ├── ddlcompiler.py
│       │   │   │   ├── sqlcompiler.py
│       │   │   │   └── typecompiler.py
│       │   │   ├── http/
│       │   │   │   └── base.py
│       │   │   ├── native/
│       │   │   │   └── base.py
│       │   │   ├── asynch/
│       │   │   │   └── base.py
│       │   │   └── reflection.py
│       │   ├── engines/
│       │   │   ├── __init__.py
│       │   │   └── base.py
│       │   ├── ext/
│       │   │   ├── clauses.py
│       │   │   └── declarative.py
│       │   ├── orm/
│       │   │   ├── query.py
│       │   │   └── session.py
│       │   ├── sql/
│       │   │   ├── ddl.py
│       │   │   ├── functions.py
│       │   │   ├── schema.py
│       │   │   └── selectable.py
│       │   └── types.py
│       ├── databases/
│       │   ├── __init__.py
│       │   ├── access.py
│       │   ├── athena.py
│       │   ├── bigquery.py
│       │   ├── cockroachdb.py
│       │   ├── cratedb.py
│       │   ├── databricks.py
│       │   ├── db2.py
│       │   ├── druid.py
│       │   ├── elasticsearch.py
│       │   ├── firebird.py
│       │   ├── firebolt.py
│       │   ├── gsheets.py
│       │   ├── hana.py
│       │   ├── hive.py
│       │   ├── mongodb.py
│       │   ├── oceanbase.py
│       │   ├── presto.py
│       │   ├── redshift.py
│       │   ├── snowflake.py
│       │   ├── teradata.py
│       │   └── tidb.py
│       └── dameng/
│           ├── __init__.py
│           ├── _compat.py
│           ├── base.py
│           ├── dmPython.py
│           ├── dmAsync.py
│           ├── globalvars.py
│           ├── json.py
│           ├── types.py
│           └── vector.py
├── tests/
│   ├── test_clickhouse/
│   │   ├── test_clickhouse_dialect.py
│   │   ├── test_ddl.py
│   │   ├── test_engines.py
│   │   ├── test_orm.py
│   │   ├── test_sql.py
│   │   ├── test_type_compiler.py
│   │   └── test_types.py
│   ├── test_databases/
│   │   ├── __init__.py
│   │   └── test_dialects.py
│   └── test_dameng/
│       ├── test_basic.py
│       ├── test_dialect.py
│       └── test_vector.py
├── pyproject.toml
├── README.md
└── LICENSE
```

### Code Style

This project uses:
- **Ruff**: Linting and formatting
- **MyPy**: Type checking

Run code quality checks:

```bash
ruff check src/ tests/
ruff format --check src/ tests/
ruff check src/ns_sqlalchemy/clickhouse/  # ClickHouse should be clean
```

### Testing

The project uses pytest for testing. All tests are located in the `tests/` directory.

## License

This project is licensed under the MIT License. See the LICENSE file for details.

## Contributing

Contributions are welcome! Please follow the existing code style and conventions.

## Support

For issues and questions, please create an issue in the GitHub repository.

## Acknowledgments

- Based on the original ns-dm-sqlalchemy project
- Inspired by SQLAlchemy's dialect architecture
- Thanks to the open-source community for their contributions

---

*Created with ❤️ for high-performance database applications*
