Metadata-Version: 2.4
Name: datasentinel-saxon
Version: 0.1.0
Summary: Zero-config data quality monitoring and drift detection for pandas DataFrames, with optional Claude AI diagnosis and hosted dashboard sync.
Author: G. Preetham Saxon
License: MIT
Project-URL: Homepage, https://datasentinel-eight.vercel.app
Project-URL: Repository, https://github.com/GPREETHAMSAXON/DataSentinel
Keywords: data-quality,data-monitoring,drift-detection,anomaly-detection,data-observability
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pandas>=1.5.0
Requires-Dist: numpy>=1.23.0
Requires-Dist: requests>=2.28.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"

# DataSentinel

Zero-config data quality monitoring for pandas DataFrames. Catches drift, anomalies, and silent data breakage — locally, in seconds, no setup required.

```bash
pip install datasentinel
```

## Quick start (fully local — no account needed)

```python
from datasentinel import DataSentinel
import pandas as pd

df = pd.read_csv("orders.csv")

ds = DataSentinel()
report = ds.check(df)
print(report)
```

```
DataSentinel Report
  500 rows x 11 columns
  Overall: NONE

  First run — baseline established. Run check() again later to detect drift.
```

Run it again tomorrow on a new export of the same data, and DataSentinel compares it against the cached baseline automatically:

```python
df_tomorrow = pd.read_csv("orders_tomorrow.csv")
report = ds.check(df_tomorrow)
print(report)
```

```
DataSentinel Report
  512 rows x 11 columns
  Overall: HIGH

  Flagged columns:
    [HIGH] discount_pct
      - Distribution shifted (PSI=0.342)
    [MEDIUM] country
      - Distinct value count changed from 7 to 11
```

## With a hosted account (history, scheduling, Slack alerts, AI diagnosis)

```python
ds = DataSentinel(api_key="ds_...")
report = ds.check(df, pipeline_name="Orders")  # profiles locally AND syncs to your dashboard
```

When synced, `report.diagnosis` contains a plain-English root-cause explanation generated by Claude, and `report.pipeline_url` links straight to the dashboard.

Get an API key by creating a free account at [datasentinel-eight.vercel.app](https://datasentinel-eight.vercel.app).

## Connecting a live database (hosted only)

```python
ds = DataSentinel(api_key="ds_...")
pipeline = ds.connect_postgres(
    dsn="postgresql://user:pass@host:5432/db",
    table="orders",
    pipeline_name="Orders Pipeline",
)
result = ds.run_pipeline(pipeline["id"])
```

This registers a scheduled pipeline identical to one created from the dashboard — it'll run automatically on its configured interval and alert you via Slack when something breaks.

## What it checks

- **Null rate drift** — sudden spikes or drops in missing data
- **Distribution drift (PSI)** — numeric and categorical distributions shifting over time
- **Cardinality drift** — new or disappearing categories
- **Volume drift** — unexpected row count changes

## Why DataSentinel

Most data quality tools are either too simple (just schema checks) or too heavy (enterprise platforms requiring a deployment team). DataSentinel sits in between: zero config to start, statistically rigorous under the hood, and — when synced — explains *why* something broke in plain English instead of just flagging that it did.

## License

MIT
