Metadata-Version: 2.4
Name: meti_profil
Version: 0.1.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Dist: pytest ; extra == 'dev'
Requires-Dist: pandas ; extra == 'dev'
Requires-Dist: polars ; extra == 'dev'
Requires-Dist: pyarrow ; extra == 'dev'
Requires-Dist: maturin ; extra == 'dev'
Provides-Extra: dev
Summary: Modern, fast data profiling in Rust with Python bindings
Keywords: data-profiling,data-quality,eda,arrow,rust
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/Metimer/meti_profil
Project-URL: Issues, https://github.com/Metimer/meti_profil/issues
Project-URL: Repository, https://github.com/Metimer/meti_profil

# meti_profil

A modern, Rust-powered data profiling library with Python bindings. It reads
CSV, Parquet, and Excel files (or pandas / polars DataFrames) and generates a
hybrid Markdown report that is readable by humans and structured for consumption
by code agents.

## Installation

```bash
pip install meti_profil
```

## Quick start

```python
import meti_profil as mp

# From a file
report = mp.ProfileReport("data.csv", title="My dataset")

# Interactive HTML report (self-contained, works offline)
report.to_html("profile.html")

# Markdown report (great for diffs and code agents)
report.to_file("profile.md")

# From a pandas DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
report = mp.ProfileReport(df)

# Programmatic access
print(report.get_summary())          # dataset-level metrics
print(report.get_column_info("age")) # per-column schema info
markdown = report.to_markdown()
html = report.to_html()              # returns the HTML as a string
```

### In a notebook

In Jupyter / VSCode, just display the report — it renders inline as an
interactive dashboard (sandboxed, no external resources):

```python
report = mp.ProfileReport(df)
report  # interactive histograms, bar charts, correlation heatmap, ...
```

### `ProfileReport` parameters

| Parameter     | Type                                   | Default             | Description                                  |
|---------------|----------------------------------------|---------------------|----------------------------------------------|
| `source`      | `str`, `Path`, pandas/polars DataFrame | required            | Data source.                                 |
| `title`       | `str`                                  | `"Dataset Profile"` | Report title (written to the frontmatter).   |
| `minimal`     | `bool`                                 | `False`             | Reserved: reduce heavy analyses.             |
| `explorative` | `bool`                                 | `True`              | Reserved: enable advanced analyses.          |

## Report format

The Markdown report starts with a YAML frontmatter block (rows, columns,
missing cells, duplicates, version) followed by normalized `## ` sections:
`Overview`, `Schema`, `Numeric Columns`, `Categorical Columns`, `Missing
Values`, `Duplicate Rows`, and `Correlations`.

## Features

- Fast Rust engine backed by Apache Arrow.
- Reads CSV, Parquet (snappy/zstd/lz4/brotli/gzip), and Excel files.
- Accepts pandas and polars DataFrames.
- Schema/type detection, descriptive numeric statistics, categorical
  frequencies, missing-value and duplicate-row analysis, and Pearson
  correlations.
- **Interactive HTML report**: a single self-contained file (embedded CSS/JS,
  no CDN) with histograms, categorical bar charts, a missing-value overview and
  a correlation heatmap — all with hover tooltips.
- **Native notebook rendering** in Jupyter / VSCode via `_repr_html_`.
- Clean Markdown reports optimized for both humans and code agents.

## Output formats

| Method                  | Output                                                |
|-------------------------|-------------------------------------------------------|
| `to_html(path)`         | Write a self-contained interactive HTML file.         |
| `to_html()`             | Return the HTML document as a string.                 |
| `to_file(path)`         | Write the Markdown report.                            |
| `to_markdown()`         | Return the Markdown report as a string.               |
| `get_summary()`         | Dataset-level metrics as a dict.                      |
| `get_column_info(name)` | Per-column schema info as a dict.                     |
| display in a notebook   | Inline interactive dashboard (`_repr_html_`).         |

## Development

Requires a [Rust toolchain](https://rustup.rs) (1.78+) and Python 3.10+.

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install maturin pytest pandas polars pyarrow

# Build the extension in-place
maturin develop

# Run the test suites
cargo test --workspace
pytest tests/python -v
```

## License

MIT

