Metadata-Version: 2.4
Name: dcs-sdk
Version: 1.8.5
Summary: SDK for DataChecks
Author: Waterdip Labs
Author-email: hello@waterdip.ai
Requires-Python: >=3.10,<3.13
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Provides-Extra: all-dbs
Provides-Extra: bigquery
Provides-Extra: clickhouse
Provides-Extra: databricks
Provides-Extra: db2
Provides-Extra: elasticsearch
Provides-Extra: impyla
Provides-Extra: mssql
Provides-Extra: mysql
Provides-Extra: opensearch
Provides-Extra: oracle
Provides-Extra: postgresql
Provides-Extra: preql
Provides-Extra: presto
Provides-Extra: redshift
Provides-Extra: snowflake
Provides-Extra: spark
Provides-Extra: sybase
Provides-Extra: trino
Provides-Extra: vertica
Requires-Dist: attrs (>=23.1.0)
Requires-Dist: azure-core (>=1.38.0,<2.0.0)
Requires-Dist: azure-identity (>=1.25.1,<2.0.0)
Requires-Dist: azure-storage-blob (>=12.27.1,<13.0.0)
Requires-Dist: click (>=8.1)
Requires-Dist: clickhouse-driver (>=0.2.9) ; extra == "clickhouse" or extra == "all-dbs"
Requires-Dist: cryptography (>=46.0.5) ; extra == "snowflake" or extra == "all-dbs"
Requires-Dist: databricks-sql-connector (>=3.3.0,<4.0.0) ; extra == "databricks" or extra == "all-dbs"
Requires-Dist: dsnparse (<0.2.0)
Requires-Dist: duckdb (>=0.9.0)
Requires-Dist: elasticsearch (>=9.1.0,<10.0.0) ; extra == "elasticsearch" or extra == "all-dbs"
Requires-Dist: filelock (>=3.20.3,<4.0.0)
Requires-Dist: google-cloud-bigquery (>=3.31.0,<4.0.0) ; extra == "bigquery" or extra == "all-dbs"
Requires-Dist: h11 (>=0.16.0,<0.17.0)
Requires-Dist: ibm-db (>=3.2.3,<4.0.0) ; extra == "db2" or extra == "all-dbs"
Requires-Dist: ibm-db-sa (>=0.4.1,<0.5.0) ; extra == "db2" or extra == "all-dbs"
Requires-Dist: impyla (>=0.20.0,<0.21.0) ; extra == "impyla" or extra == "all-dbs"
Requires-Dist: jinja2 (>=3.1.6,<4.0.0)
Requires-Dist: keyring (>=25.3.0)
Requires-Dist: loguru (==0.7.2)
Requires-Dist: mashumaro[msgpack] (>=2.9,<3.11.0)
Requires-Dist: mysql-connector-python (>=9.0.1) ; extra == "mysql" or extra == "all-dbs"
Requires-Dist: nltk (>=3.9.3,<4.0.0)
Requires-Dist: numpy (==1.26.4)
Requires-Dist: opensearch-py (>=2.2.0,<3.0.0) ; extra == "opensearch" or extra == "all-dbs"
Requires-Dist: oracledb (>=2.4.1) ; extra == "oracle" or extra == "all-dbs"
Requires-Dist: orjson (>=3.11.7,<4.0.0)
Requires-Dist: packaging (>=24.1,<25.0)
Requires-Dist: preql (>=0.2.19) ; extra == "preql" or extra == "all-dbs"
Requires-Dist: presto-python-client (>=0.8.4) ; extra == "presto" or extra == "all-dbs"
Requires-Dist: protobuf (>=5.29.6,<6.0.0)
Requires-Dist: psycopg2-binary (>=2.9.9,<3.0.0) ; extra == "postgresql" or extra == "redshift" or extra == "all-dbs"
Requires-Dist: pyasn1 (>=0.6.2,<0.7.0)
Requires-Dist: pydantic (>=1.10.12)
Requires-Dist: pymysql[rsa] (>=1.1.0,<2.0.0) ; extra == "mysql" or extra == "all-dbs"
Requires-Dist: pyodbc (>=4.0.39) ; extra == "mssql" or extra == "sybase" or extra == "all-dbs"
Requires-Dist: pyparsing (>=3.1.1,<4.0.0)
Requires-Dist: pyspark (>=3.2.1,<4.0.0) ; extra == "spark" or extra == "all-dbs"
Requires-Dist: python-dateutil (>=2.8.2,<3.0.0)
Requires-Dist: python-dotenv (>=1.0.1,<2.0.0)
Requires-Dist: pytz (>=2024.1)
Requires-Dist: pyyaml (>=6.0.1,<7.0.0)
Requires-Dist: redis[hiredis] (>=5.2.1,<6.0.0)
Requires-Dist: requests (>=2.32.4,<3.0.0)
Requires-Dist: rich (>=13.8.0)
Requires-Dist: setuptools (>=78.1.1)
Requires-Dist: snowflake-connector-python (>=3.17.2) ; extra == "snowflake" or extra == "all-dbs"
Requires-Dist: snowflake-sqlalchemy (>=1.5.3,<2.0.0) ; extra == "snowflake" or extra == "all-dbs"
Requires-Dist: sqlalchemy (>=2.0.14,<2.1.0)
Requires-Dist: sqlalchemy-bigquery (>=1.8.0,<2.0.0) ; extra == "bigquery" or extra == "all-dbs"
Requires-Dist: sqlalchemy-sybase (>=2.0.0,<3.0.0) ; extra == "sybase" or extra == "all-dbs"
Requires-Dist: sqlglot (>=28.10.1,<29.0.0)
Requires-Dist: tabulate (>=0.9.0)
Requires-Dist: toml (>=0.10.2)
Requires-Dist: tornado (>=6.5,<7.0)
Requires-Dist: trino (>=0.314.0) ; extra == "trino" or extra == "all-dbs"
Requires-Dist: typing-extensions (>=4.0.1)
Requires-Dist: urllib3 (>=2.6.3,<3.0.0)
Requires-Dist: vertica-python (>=1.4.0) ; extra == "vertica" or extra == "all-dbs"
Requires-Dist: virtualenv (>=20.36.2,<21.0.0)
Description-Content-Type: text/markdown

<h1 align="center">
  DCS SDK v1.8.5
</h1>

> SDK for DataChecks

## Installation

> Python version `>=3.10,<3.13`

```bash

$ pip install dcs-sdk[all-dbs]

```

## Supported Databases

> Availability Status

| Database          | Code Name    | Supported |
| ----------------- | ------------ | --------- |
| PostgreSQL        | `postgres`   | ✅        |
| Snowflake         | `snowflake`  | ✅        |
| Trino             | `trino`      | ✅        |
| Databricks        | `databricks` | ✅        |
| Oracle            | `oracle`     | ✅        |
| MSSQL             | `mssql`      | ✅        |
| MySQL             | `mysql`      | ✅        |
| SAP Sybase IQ/ASE | `sybase`     | ✅        |
| File              | `file`       | ✅        |
| BigQuery          | `bigquery`   | ✅        |

## Available Commands

|    Option     | Short Option | Required |     Default     |                    Description                     |                                                 Example                                                  |
| :-----------: | :----------: | :------: | :-------------: | :------------------------------------------------: | :------------------------------------------------------------------------------------------------------: |
| --config-path |      -C      | **Yes**  |      None       |    Specify the file path for the configuration     |                        dcs-sdk run --config-path config.yaml --compare comp_name                         |
|   --compare   |              | **Yes**  |      None       | Run only specific comparison using comparison name |                        dcs-sdk run --config-path config.yaml --compare comp_name                         |
|  --save-json  |      -j      |    No    |      False      |           Save the data into a JSON file           |                  dcs-sdk run --config-path config.yaml --compare comp_name --save-json                   |
|  --json-path  |     -jp      |    No    | dcs_report.json |        Specify the file path for JSON file         |       dcs-sdk run --config-path config.yaml --compare comp_name --save-json --json-path ouput.json       |
|    --stats    |              |    No    |      False      |            Print stats about data diff             |                    dcs-sdk run --config-path config.yaml --compare comp_name --stats                     |
|     --url     |              |    No    |      None       |         Specify url to send data to server         |        dcs-sdk run --config-path config.yaml --compare comp_name --url=https://comapre/send/data         |
| --html-report |              |    No    |      False      |                 Save table as HTML                 |                 dcs-sdk run --config-path config.yaml --compare comp_name --html-report                  |
| --report-path |              |    No    | dcs_report.html |       Specify the file path for HTML report        |     dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html     |
|    --table    |              |    No    |      False      |         Display Comparison in table format         | dcs-sdk run --config-path config.yaml --compare comp_name --html-report --report-path table.html --table |

### Example Command [CLI]

```sh
$ dcs-sdk --version

$ dcs-sdk --help

$ dcs-sdk run -C example.yaml --compare comparison_one --stats -j -jp output.json --html-report --report-path result.html --table --url=https://comapre/send/data
```

## File Comparisons

`dcs-sdk` supports file-backed comparisons through DuckDB for:

- `.csv`
- `.parquet`
- mixed-format comparisons such as `csv ↔ parquet`

Supported file datasource types:

- `file`
- `azure_blob`

Notes:

- File paths must point to concrete `.csv` or `.parquet` files or globs.
- Query-backed file comparisons are supported. When `source.query` or `target.query` is provided, the SDK loads the file into DuckDB and compares against the filtered/projected query view.

### Local File Example

```yaml
data_sources:
  - name: source_file
    type: file
    file_path: sample_data/parquet/one_source.parquet

  - name: target_file
    type: file
    file_path: sample_data/parquet/two_target.parquet

comparisons:
  parquet_file_diff:
    source:
      data_source: source_file
      table: one_source
    target:
      data_source: target_file
      table: two_target
    key_columns: [id]
    columns: [customer_name, status, amount, region]
```

Run it with:

```bash
dcs-sdk run -C parquet_file_comparison.yaml --compare parquet_file_diff --stats
```

## Databricks Query-Backed Comparisons

Databricks comparisons can use either:

- a table name
- a SQL query

For Parquet files stored on Databricks, use a query with `read_files(...)`.

### Databricks Table vs Parquet Example

```yaml
data_sources:
  - name: databricks_demo
    type: databricks
    connection:
      host: your-workspace.cloud.databricks.com
      port: 443
      http_path: /sql/1.0/warehouses/your-warehouse
      access_token: ${DATABRICKS_TOKEN}
      catalog: dcs_demo_databricks
      schema: source
    temporary_schema: temp_schema

comparisons:
  databricks_table_vs_parquet:
    source:
      data_source: databricks_demo
      table: source_table
    target:
      data_source: databricks_demo
      query: |
        SELECT *
        FROM read_files(
          '/Volumes/dcs_demo_databricks/source/dcs-test-volumne/two_target.parquet',
          format => 'parquet'
        )
      view_name: datachecks_target_file
      materialization_type: table
    key_columns: [id]
    columns: [customer_name, status, amount, region]
```

Notes:

- Query-backed Databricks comparisons require `temporary_schema`.
- Generated temp views/tables use the `datachecks_` prefix.
- Prefer Unity Catalog volume paths such as `/Volumes/...` for Databricks file queries.
- Legacy DBFS root paths such as `dbfs:/raw/...` are not the recommended path for this flow.

