Metadata-Version: 2.4
Name: py-devo
Version: 0.2.1
Summary: DEVO — CSV to iCSV enrichment and Frictionless validation
License-Expression: MIT
Project-URL: Source, https://github.com/chasenunez/devo
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: frictionless>=4.0.0
Provides-Extra: webui
Requires-Dist: flask>=2.0.0; extra == "webui"
Dynamic: license-file

# DEVO
**Data Enrichment and Validation Operator.** Takes a plain CSV, infers types and constraints, writes a self-documenting [iCSV](https://envidat.github.io/iCSV/) file plus a Frictionless schema, and validates the data against it.

If you give it a `.csv`, it enriches → schema → validates. If you give it an `.icsv`, it skips enrichment.


## Install from PyPI
```bash
pip install py-devo
```

## Install from local cloned repository
```bash
pip install -e .
```

For the Flask web demo:

```bash
pip install -e ".[webui]"
```

Requires Python 3.9+ and `frictionless` (v4 or v5).

## Try it out

A small sample dataset lives at `examples/sample.csv` — three columns (`timestamp`, `PSUM`, `TA`) representing hourly weather observations. Use it to take DEVO for a spin without needing your own data.

### CLI

```bash
# Enrich, build schema, and validate in one command
devo run examples/sample.csv

# Results land in DEVO_output/ by default:
#   sample.icsv               — annotated iCSV
#   sample_schema.json        — Frictionless Table Schema
#   sample_DEVO_report.txt    — human-readable validation report
```

Run `devo run examples/sample.csv --out my_output` to write to a different directory.

### Python

```python
from devo.enrich import ICSVEnricher
from devo.validate import validate_icsv

icsv, schema = ICSVEnricher().make_icsv("examples/sample.csv", "DEVO_output")
report_path, valid = validate_icsv(icsv, schema_path=schema)

print(f"Valid: {valid}")
print(f"Report written to: {report_path}")
```

### Web demo

Install the optional Flask dependency first (if you haven't already):

```bash
pip install -e ".[webui]"
```

Start the local server:

```bash
flask --app devo.webui run
```

Then open `http://127.0.0.1:5000` in your browser. Click **Choose File**, select `examples/sample.csv`, and click **Upload**. The page will display the paths to the generated iCSV, schema, and report, along with the overall `Valid` result.

> The web UI is a local demo only — do not expose it to a network.

---

## CLI

```bash
devo enrich   data.csv                    # write data.icsv + data_schema.json
devo validate data.icsv                   # validate against neighbouring schema
devo run      data.csv                    # do both in one go
```

Common flags: `--out DIR` (default `DEVO_output/`), `--delimiter CHAR`, `--nodata VALUE`, `--app PROFILE`, `--schema PATH`.

Exit codes: `0` = success, `1` = validation failed, `2` = usage or runtime error.

## What lands on disk

For input `data.csv`, after `devo run`:

| File | What |
|---|---|
| `DEVO_output/data.icsv` | iCSV with `# [METADATA]`, `# [FIELDS]`, `# [DATA]` |
| `DEVO_output/data_schema.json` | Frictionless Table Schema JSON |
| `DEVO_output/data_DEVO_report.txt` | Validation report (read this) |

## Python API

```python
from devo.enrich import ICSVEnricher
from devo.validate import validate_icsv

icsv, schema = ICSVEnricher().make_icsv("data.csv", "DEVO_output")
report_path, valid = validate_icsv(icsv, schema_path=schema)
```

## Files

```
devo/
├── cli.py          # argparse front-end (enrich / validate / run)
├── enrich.py       # CSV → iCSV + schema (ICSVEnricher class)
├── validate.py     # iCSV + schema → Frictionless validation + report
├── _infer.py       # pure type-inference functions (shared by enrich + validate)
├── _parser.py      # iCSV header parser (shared by enrich + validate)
├── _schema.py      # per-column statistics + Frictionless schema builder
├── _report.py      # plain-text report writer
├── exceptions.py   # DEVOError hierarchy
└── webui.py        # Flask demo (optional; requires pip install -e ".[webui]")
tests/
├── conftest.py
├── fixtures/       # sample CSV and iCSV files
└── test_*.py
```

## How it works

### Enrichment (`devo enrich`)

1. **Read** — the CSV is read in one pass. If no `--delimiter` is given, `csv.Sniffer` detects it from the first 10 lines.
2. **Delimiter mapping** — comma is remapped to pipe in the iCSV output (pipe is also the default fallback for non-spec delimiters). Column names that contain the output delimiter are rejected with a clear error.
3. **Normalisation** — every row is padded or clipped to header length and stripped of leading/trailing whitespace.
4. **Type inference** — each column is classified: `integer → number → datetime → string`. Scientific notation (`1.5e-3`, `2E10`) is recognised as `number`. Missing-value sentinels (and any custom `--nodata` value) are excluded before inference.
5. **Statistics** — per-column `min`, `max`, and `missing_count` are computed from the normalised data and written to the iCSV `# [FIELDS]` section. They do not appear in the Frictionless schema JSON.
6. **Geometry detection** — if the header contains `lat`/`latitude` + `lon`/`lng`/`longitude`, DEVO writes `geometry = column:lat,lon` and `srid = EPSG:4326` to metadata. A single column named `geometry` (WKT) gets `geometry = column:geometry` only — no `srid`, because WKT embeds its own CRS.
7. **Write** — the normalised rows are written to the iCSV `# [DATA]` section, and the Frictionless schema is written to `_schema.json`.

### Validation (`devo validate`)

1. **Parse header** — `_parser.py` reads the `# [METADATA]` and `# [FIELDS]` sections, using `field_delimiter` from metadata to split field values.
2. **Metadata check** — required keys are verified. `geometry` and `srid` are only checked when spatial column names are present; `srid` is only required for lat/lon columns (not WKT).
3. **Type cross-check (Option A)** — column types are re-inferred from up to 500 data rows and compared to the declared types. The iCSV's own `nodata` sentinel is merged with the standard missing-value set before re-inference so custom sentinels are not mistaken for real data. Inferred type narrower than or equal to declared → `[OK]`. Inferred wider → `[WARN]`.
4. **Frictionless validation** — data is written to a temporary comma-delimited CSV and validated against the schema using `frictionless.Resource`. The temp file is always deleted in a `finally` block.
5. **Report** — a plain-text `.txt` report is written with three sections: `METADATA`, `TYPE CONSISTENCY`, and `DATA VALIDATION`. `Valid: YES` only when metadata has no `[FAIL]` entries and Frictionless reports no data errors. Type warnings do not affect the valid flag.

## Limitations

- Type inference is conservative: `integer → number → datetime → string`. Mixed-format columns fall back to `string`.
- Datetime detection uses `datetime.fromisoformat()` and a fixed list of common strptime formats. Unusual formats need a custom schema.
- Column descriptions are left blank in the iCSV `# [FIELDS]` section; fill them in by hand.
- The web UI (`webui.py`) is a local demo only — do not expose it to a network.

## License

MIT. See `LICENSE`.
