Metadata-Version: 2.4
Name: datannurpy
Version: 0.26.5
Summary: Scan files and databases into ready-to-use datannur catalogs
Project-URL: Homepage, https://github.com/datannur/datannurpy
Project-URL: Repository, https://github.com/datannur/datannurpy
Project-URL: Documentation, https://github.com/datannur/datannurpy#readme
Project-URL: Issues, https://github.com/datannur/datannurpy/issues
Author: datannur
License: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: !=3.9.0,!=3.9.1,>=3.9
Requires-Dist: fsspec>=2024.0
Requires-Dist: ibis-framework[duckdb]>=11.0
Requires-Dist: jsonjsdb>=0.8.11
Requires-Dist: openpyxl>=3.0
Requires-Dist: polars>=1.0
Requires-Dist: pyshacl>=0.30
Requires-Dist: python-dotenv>=1.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rdflib>=7.0
Requires-Dist: typing-extensions>=4.0; python_version < '3.11'
Requires-Dist: xlrd>=2.0
Provides-Extra: azure
Requires-Dist: adlfs>=2024.0; (python_version >= '3.10') and extra == 'azure'
Provides-Extra: cloud
Requires-Dist: adlfs>=2024.0; (python_version >= '3.10') and extra == 'cloud'
Requires-Dist: gcsfs>=2024.0; (python_version >= '3.10') and extra == 'cloud'
Requires-Dist: s3fs>=2024.0; (python_version >= '3.10') and extra == 'cloud'
Provides-Extra: databases
Requires-Dist: ibis-framework[mssql,mysql,oracle,postgres]>=11.0; extra == 'databases'
Provides-Extra: delta
Requires-Dist: deltalake>=0.18.0; extra == 'delta'
Provides-Extra: gcs
Requires-Dist: gcsfs>=2024.0; (python_version >= '3.10') and extra == 'gcs'
Provides-Extra: iceberg
Requires-Dist: pyiceberg>=0.10.0; (python_version >= '3.10') and extra == 'iceberg'
Requires-Dist: requests>=2.33.0; (python_version >= '3.10') and extra == 'iceberg'
Provides-Extra: mssql
Requires-Dist: ibis-framework[mssql]>=11.0; extra == 'mssql'
Provides-Extra: mysql
Requires-Dist: ibis-framework[mysql]>=11.0; extra == 'mysql'
Provides-Extra: oracle
Requires-Dist: ibis-framework[oracle]>=11.0; extra == 'oracle'
Provides-Extra: postgres
Requires-Dist: ibis-framework[postgres]>=11.0; extra == 'postgres'
Provides-Extra: s3
Requires-Dist: s3fs>=2024.0; (python_version >= '3.10') and extra == 's3'
Provides-Extra: ssh
Requires-Dist: paramiko>=5.0; extra == 'ssh'
Provides-Extra: stat
Requires-Dist: pyreadstat<=1.2.8,>=1.2.0; (python_version < '3.10') and extra == 'stat'
Requires-Dist: pyreadstat>=1.2.0; (python_version >= '3.10') and extra == 'stat'
Description-Content-Type: text/markdown

<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/datannur/datannur/main/package/app/assets/main-banner-dark.png">
  <img alt="datannur logo" src="https://raw.githubusercontent.com/datannur/datannur/main/package/app/assets/main-banner.png">
</picture>

[![MIT License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PyPI version](https://img.shields.io/pypi/v/datannurpy.svg)](https://pypi.org/project/datannurpy/)
[![Python](https://img.shields.io/badge/python-≥3.9-blue.svg)](https://pypi.org/project/datannurpy/)
[![CI](https://github.com/datannur/datannurpy/actions/workflows/ci.yml/badge.svg)](https://github.com/datannur/datannurpy/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/datannur/datannurpy/branch/main/graph/badge.svg)](https://codecov.io/gh/datannur/datannurpy)

# datannurpy

datannurpy is the Python builder for [datannur](https://github.com/datannur/datannur). It scans files and databases, extracts metadata and statistics, then generates a ready-to-use catalog bundled with the datannur app.

**Key features:**

- **Broad format support** - CSV, Excel, Parquet, Delta Lake, Iceberg, SAS, SPSS, Stata
- **Database introspection** - PostgreSQL, MySQL, Oracle, SQL Server, SQLite, DuckDB
- **Remote and cloud storage** - SFTP, S3, Azure Blob, GCS via fsspec
- **Metadata extraction** - Schemas, statistics, frequencies, enumerations, auto-tagging
- **Incremental scans** - Only rescan what changed between runs
- **YAML or Python API** - Declarative configuration or programmatic control

## Quick start

```bash
pip install datannurpy
```

```yaml
# catalog.yml
app_path: ./my-catalog
open_browser: true

add:
  - folder: ./data
    include: ["*.csv", "*.xlsx", "*.parquet"]

  - database: sqlite:///mydb.sqlite
```

```bash
python -m datannurpy catalog.yml
```

This command scans the configured sources, generates the catalog files, and opens the datannur app.

## Documentation

📖 **Full documentation:** [docs.datannur.com/builder](https://docs.datannur.com/builder/)

🗂️ **datannur app:** [github.com/datannur/datannur](https://github.com/datannur/datannur)

🌐 **Website:** [datannur.com](https://datannur.com)

🚀 **Demo:** [dev.datannur.com](https://dev.datannur.com/)

## Contributing

For development documentation and contributing guidelines, see [`CONTRIBUTING.md`](CONTRIBUTING.md).

## License

MIT - see [LICENSE](LICENSE). All dependencies are MIT/Apache 2.0/BSD compatible.
