Metadata-Version: 2.4
Name: timeseries-qc
Version: 0.4.1
Summary: Classify timeseries data as Good / Sus / Bad based on logic and business rules and render a multi-tag quality timeline chart.
Author: timeseries-qc contributors
License-Expression: MIT
Project-URL: Homepage, https://nagusubra.github.io/timeseries-qc/
Project-URL: Repository, https://github.com/nagusubra/timeseries-qc
Project-URL: Issues, https://github.com/nagusubra/timeseries-qc/issues
Keywords: timeseries,data quality,QC,SCADA,IoT,pandas
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5
Requires-Dist: plotly>=5.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Dynamic: license-file

# timeseries-qc

[![PyPI](https://img.shields.io/pypi/v/timeseries-qc)](https://pypi.org/project/timeseries-qc/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

**The open source data quality-control layer for SCADA, DCS, IoT, and historian timeseries data.**

Add `good / sus / bad` quality labels to every row of a pandas DataFrame in five lines. Then render a multi-tag horizontal status timeline, the chart that no other open-source library produces.

A simple to digest and understand timeseries data quality check. Catch the issues in your process data before it affects your downstream analytics and business decisions. Build data quality checks based on business rules and monitor through interactive graph  components. 

**Sample Input - Solar farm SCADA data:**

```text
| timestamp                 | tag_name       | value   |
| :------------------------ | :------------- | :------ |
| 2026-01-01 00:00:00+00:00 | INVERTER.MW    | 42.1    |
| 2026-01-01 01:00:00+00:00 | INVERTER.MW    | NULL    |  <-- timeseries_qc will catch this (Null value)
| 2026-01-01 02:00:00+00:00 | INVERTER.MW    | 52.3    |
| 2026-01-01 00:00:00+00:00 | MET.IRRADIANCE | 600.001 |
| 2026-01-01 01:00:00+00:00 | MET.IRRADIANCE | 600.001 |  <-- timeseries_qc will catch this (Stale/Frozen value)
| 2026-01-01 02:00:00+00:00 | MET.IRRADIANCE | 810.818 |
| 2026-01-01 00:00:00+00:00 | TRACKER.ANGLE  | 30.22   |
| 2026-01-01 01:00:00+00:00 | TRACKER.ANGLE  | 45.31   |
| 2026-01-01 02:00:00+00:00 | TRACKER.ANGLE  | 60.22   |
```

**Sample Output - Solar farm SCADA data:**

![Solar farm SCADA data quality example](./docs/assets/images/solar_farm_example.png)


**Sample Input - Oil field SCADA data:**

```text
| timestamp                 | tag_name     | value  |
| :------------------------ | :----------- | :----- |
| 2026-01-01 00:00:00+00:00 | WHP.PSIG     | 0      |  <-- timeseries_qc will catch this (Flatline/Zero)
| 2026-01-01 01:00:00+00:00 | WHP.PSIG     | 0      |  <-- timeseries_qc will catch this (Flatline/Zero)
| 2026-01-01 02:00:00+00:00 | WHP.PSIG     | 0      |  <-- timeseries_qc will catch this (Flatline/Zero)
| 2026-01-01 00:00:00+00:00 | FMRATE.MSCFD | 12.1   |
| 2026-01-01 01:00:00+00:00 | FMRATE.MSCFD | 90.99  |  <-- timeseries_qc will catch this (Rate-of-change spike)
| 2026-01-01 02:00:00+00:00 | FMRATE.MSCFD | 12.3   |
| 2026-01-01 00:00:00+00:00 | OHT.TEMP_F   | 30.2   |
| 2026-01-01 01:00:00+00:00 | OHT.TEMP_F   | 45.2   |
| 2026-01-01 02:00:00+00:00 | OHT.TEMP_F   | 6000.2 |  <-- timeseries_qc will catch this (Out of bounds)
```

**Sample Output - Oil field SCADA data:**

![Oil field SCADA data quality example](./docs/assets/images/oil_field_example.png)


## Features

- **External quality column** — ingest a pre-existing historian/SCADA quality column and use it exclusively or merge it with internal rules (`exclusive`, `combined`, `none` modes)
- **Five built-in rules** cover ≥90% of real-world bad data: `NullRule`, `FlatlineRule`, `DeltaRule`, `RangeRule`, `OutlierRule`
- **Timeline chart** (`result.plot()`) — Plotly Gantt-style, one row per tag, Green/Yellow/Red, hover tooltips
- **YAML config** — non-coders set thresholds in a text file, no Python required
- **Timestamp health** (`result.check_timestamps()`) — detects gaps, duplicates, non-monotonic, freq drift, DST ambiguity
- **Self-contained HTML export** (`result.export_report("report.html")`) — offline, no CDN, includes per-issue summary table
- **Per-issue breakdown** (`result.issue_summary()`) — start/end times, row count, duration, status, and triggered rule names for each contiguous bad/sus segment
- **Pandas-native** — works with any DataFrame that has `timestamp`, `tag_name`, `value` columns

---

## Installation

```bash
pip install timeseries-qc
```

---

## Quickstart (5 lines)

```python
import tsqc
import pandas as pd

df = pd.read_csv("sensor_data.csv")          # columns: timestamp, tag_name, value
result = tsqc.check(df, assume_tz="UTC")     # assume_tz required for tz-naive CSVs
result.plot().show()                          # renders the multi-tag quality timeline
```

If your CSV already contains tz-aware timestamps (ISO 8601 with `+00:00`), omit `assume_tz`.

The chart x-axis, hover tooltips, `result.df`, `issue_summary()`, and `check_timestamps()` all display timestamps in the **original input timezone** — so local time is shown automatically, no extra configuration needed.

---

## YAML Config Example

```yaml
# tsqc_rules.yaml
default_rules:
  - check: null
    level: bad
  - check: flatline
    window: 1h
    min_delta: 0.001
    level: sus
  - check: delta
    max_delta: 50.0
    level: sus

tag_rules:
  FOREBAY.LEVEL:
    - check: range
      min: 900
      max: 1100
      level: bad
  "GENERATOR.*":
    - check: range
      min: 0
      max: 200
      level: bad
    - check: flatline
      window: 30min
      min_delta: 0.5   # 0 MW for <30min is valid; longer flatline at non-zero is suspect
      level: sus
```

```python
result = tsqc.check(df, rules="tsqc_rules.yaml")
result.summary()           # DataFrame: pct_good/sus/bad per tag
result.issue_summary()     # DataFrame: per-issue runs (start, end, rows, duration, reasons)
result.check_timestamps()  # DataFrame: gap/duplicate/non_monotonic issues
result.export_report("report.html")  # Full HTML with chart + all tables
```

---

---

## External Quality Column (Historian Status)

If your data already has a quality/status column from a SCADA historian (e.g. OSIsoft PI's `IsGood` or custom status codes), you can use it directly:

```python
# Exclusive mode — use only the external quality column, skip internal rules
result = tsqc.check(
    df,
    external_quality_col="status",       # column with 0,1,2,3,4 values
    quality_mode="exclusive",            # or "combined" to merge with internal
    quality_map={0: "good", 1: "sus", 2: "bad", 3: "bad", 4: "bad"},
    assume_tz="UTC",
)
```

| Mode | Behavior |
|------|----------|
| `exclusive` | External quality **only**; no internal rules run |
| `combined` | External + internal merged (worst-wins: bad > sus > good) |
| `none` | Internal only; ignores external column (escape hatch) |

- Unmapped quality values become `bad` with reason `external_quality_value: <raw_value>`
- Column conflict (input col matches output col name) → auto-renamed to `qc_quality` / `qc_quality_reasons`; input col preserved
- `quality_map` in YAML takes precedence over the `quality_map=` parameter
- `quality_mode="none"` does **not** require a `quality_map`

---

## Output Schema

`result.df` adds two columns to your DataFrame:

| Column | Values | Notes |
|--------|--------|-------|
| `quality` | `"good"`, `"sus"`, `"bad"` | Worst-level rule wins |
| `quality_reasons` | e.g. `"flatline\|range"` | Pipe-delimited triggered rule names |

---

## Comparison with Alternatives

**Pecos** (Sandia Labs) offers binary pass/fail and has been in maintenance mode since 2021 — no timeline chart and no YAML config. **SaQC** (Helmholtz UFZ) is a rich flagging engine for environmental science but has an environmental-domain API, no timeline visualization, and an LGPL license. **Great Expectations** is not timeseries-native and produces no visualization. `timeseries-qc` is the only library that combines (1) Good/Sus/Bad classification, (2) the multi-tag horizontal status timeline, and (3) YAML-driven configuration in a single `pip install`.

---

## Examples

- [examples/solar_farm.ipynb](examples/solar_farm.ipynb) — solar farm SCADA data with anomaly injection
- [examples/oilfield.ipynb](examples/oilfield.ipynb) — oil well pad SCADA data with anomaly injection

---

## Known Limitations (v0.4.0)

1. **Pandas only.** PySpark and Polars support are deferred.
2. **No YAML override of default rules.** Tag-specific rules add to, not replace, default rules.
3. **Visualization requires Plotly ≥ 5.0.** Matplotlib output not supported.

---

## License

MIT © timeseries-qc contributors
