Metadata-Version: 2.4
Name: pyde-toolkit
Version: 1.0.0
Summary: Infer column names, data types, schema, and CREATE TABLE/VIEW DDL from a file or a pandas DataFrame — Pandas/ANSI SQL or PySpark/Spark SQL.
Author: Your Name
License: MIT
Project-URL: Homepage, https://github.com/your-org/pyde-toolkit
Project-URL: Issues, https://github.com/your-org/pyde-toolkit/issues
Keywords: pandas,pyspark,schema,ddl,data-engineering,delta-lake,databricks
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.3
Requires-Dist: numpy>=1.21
Provides-Extra: excel
Requires-Dist: openpyxl>=3.0; extra == "excel"
Requires-Dist: xlrd>=2.0; extra == "excel"
Requires-Dist: odfpy>=1.4; extra == "excel"
Provides-Extra: memcheck
Requires-Dist: psutil>=5.9; extra == "memcheck"
Provides-Extra: all
Requires-Dist: openpyxl>=3.0; extra == "all"
Requires-Dist: xlrd>=2.0; extra == "all"
Requires-Dist: odfpy>=1.4; extra == "all"
Requires-Dist: psutil>=5.9; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=4.0; extra == "dev"
Dynamic: license-file

# pyde_toolkit

Infer column names, data types, schema definitions, and `CREATE TABLE` / `CREATE VIEW` DDL from a CSV/TSV/Excel file — or directly from a **pandas DataFrame already in memory** (including a Spark DataFrame converted via `.toPandas()`). Outputs either Pandas/ANSI SQL or PySpark/Spark SQL, with optional Databricks medallion-layer (`bronze`/`silver`/`gold`) support.

## Installation

```bash
pip install pyde_toolkit
```

Reading Excel files needs the optional extras:

```bash
pip install "pyde_toolkit[excel]"
```

> Not yet on PyPI? See [Building & Publishing](#building--publishing) below to build and install it locally first.

## Quick Start — pass a DataFrame directly

This is the primary intended use case: no file I/O, just hand it a DataFrame.

```python
import pandas as pd
from pyde_toolkit import infer_file

df = pd.DataFrame({
    "Plant Description": ["Mumbai Plant", "Pune Plant"],
    "ZODI/ZLDI":          ["ZODI", "ZLDI"],
    "Cost %":              [12.5, 8.0],
})

result = infer_file(df, pyspark=True, casing="snake", table_name="plant_master")

print(result["schema"])        # PySpark StructType, ready to paste
print(result["create_table"])  # CREATE TABLE ... USING DELTA
print(result["rename_code"])   # df.withColumnRenamed(...) snippet
```

Works the same way from a Spark DataFrame in a Databricks notebook:

```python
result = infer_file(spark_df.toPandas(), pyspark=True, casing="snake",
                     table_name="sales_fact", layer="silver", catalog="prod")
```

## Quick Start — pass a file path

```python
result = infer_file("Sales1.csv", casing="pascal")   # Pandas + ANSI SQL by default
```

## Command line

The same engine is also available as a CLI, installed as `pyde_toolkit`:

```bash
pyde_toolkit Sales1.csv
pyde_toolkit Sales1.csv --pyspark true --case pascal
pyde_toolkit Sales1.csv --pyspark true --layer all --catalog prod
pyde_toolkit --help
```

## Full documentation

See [`docs/USAGE.md`](docs/USAGE.md) for the complete reference: every flag/parameter, casing rules, type-inference behaviour, sampling, medallion layers, table types, and the full `infer_file()` return value.

## Building & Publishing

This repo is set up as a standard `pyproject.toml` package, so it can be built and installed without needing PyPI:

```bash
# Install locally, editable (changes to source take effect immediately)
pip install -e .

# Or build a wheel/sdist you can distribute internally
pip install build
python -m build              # creates dist/*.whl and dist/*.tar.gz
pip install dist/pyde_toolkit-1.0.0-py3-none-any.whl
```

To publish to PyPI so it's installable via a plain `pip install pyde_toolkit`, you'll need your own PyPI account/API token, then:

```bash
pip install twine
twine upload dist/*
```

(Double-check the name `pyde_toolkit` isn't already taken on PyPI before publishing — rename it in `pyproject.toml` if it is.)

## License

MIT — see [`LICENSE`](LICENSE).
