Metadata-Version: 2.4
Name: chrono-correlator
Version: 0.7.0
Summary: Statistical correlation between time-series and discrete events with optional LLM narration
Author-email: Raúl Gallardo <g3ov3r@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/Raulcadiz/chrono-correlator
Project-URL: Repository, https://github.com/Raulcadiz/chrono-correlator
Project-URL: Bug Tracker, https://github.com/Raulcadiz/chrono-correlator/issues
Keywords: statistics,time-series,correlation,mann-whitney,llm,health,monitoring
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: scipy>=1.11
Requires-Dist: numpy>=1.24
Provides-Extra: groq
Requires-Dist: groq>=0.4.0; extra == "groq"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20.0; extra == "anthropic"
Provides-Extra: ollama
Requires-Dist: ollama>=0.1.0; extra == "ollama"
Provides-Extra: llm
Requires-Dist: groq>=0.4.0; extra == "llm"
Requires-Dist: anthropic>=0.20.0; extra == "llm"
Requires-Dist: ollama>=0.1.0; extra == "llm"
Provides-Extra: pandas
Requires-Dist: pandas>=1.5; extra == "pandas"
Provides-Extra: all
Requires-Dist: groq>=0.4.0; extra == "all"
Requires-Dist: anthropic>=0.20.0; extra == "all"
Requires-Dist: ollama>=0.1.0; extra == "all"
Requires-Dist: pandas>=1.5; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: pandas>=1.5; extra == "dev"
Dynamic: license-file

# chrono-correlator

A generic statistical engine that correlates time-series data with discrete events using Mann-Whitney U, and narrates results with an LLM only when p < 0.05.

## Install

```bash
# Core (statistics only — no LLM required)
pip install chrono-correlator

# With specific LLM provider
pip install chrono-correlator[groq]
pip install chrono-correlator[anthropic]
pip install chrono-correlator[ollama]      # local, no API key

# Everything
pip install chrono-correlator[all]
```

## Quick start

```python
from datetime import datetime, timedelta
from chrono_correlator import Event, Metric, evaluate, narrate

base = datetime(2024, 1, 1)

events = [
    Event(timestamp=base + timedelta(days=d), label="migraine")
    for d in [10, 20, 30]
]

timestamps = [base + timedelta(hours=h) for h in range(800)]
values = [55.0] * 800
for day in [10, 20, 30]:
    for h in range(48):
        idx = day * 24 - 48 + h
        if 0 <= idx < 800:
            values[idx] = 28.0

hrv = Metric(name="hrv", timestamps=timestamps, values=values)

# FDR correction enabled by default — reduces false positives with multiple metrics
report = evaluate(events, [hrv])
print(f"Level: {report.level} — {report.active_signals}/{report.total_signals} signals")

if report.level != "green":
    report = narrate(report, provider="groq")
    print(report.narrative)
```

## From a pandas DataFrame

```python
import pandas as pd
from chrono_correlator import Metric

df = pd.read_csv("hrv_data.csv")   # columns: timestamp, value
hrv = Metric.from_dataframe(df, name="hrv", timestamp_col="timestamp", value_col="value")
```

## Lag analysis

```python
# Test if the signal appears 24h before the event instead of immediately before
report = evaluate(events, metrics, lag_hours=24)
```

## Continuous monitoring (no events needed)

```python
from chrono_correlator import monitor, loop

# Single evaluation at now()
report = monitor(metrics, narrate=False)

# Infinite loop — calls on_alert when level is yellow or red
def alert_handler(report):
    print(f"ALERT {report.level.upper()}: {report.narrative}")

loop(metrics_fn=lambda: metrics, interval_seconds=3600, on_alert=alert_handler)
```

## Custom LLM provider

```python
from chrono_correlator import BaseNarrator

class OllamaNarrator(BaseNarrator):
    def generate(self, prompt: str) -> str:
        # call your local model
        ...

report = OllamaNarrator().narrate(report)
```

## Key finding: p-value alone is not enough

Statistical significance (p < 0.05) can appear in large samples even with no real pattern.
**Effect size + consistency** is what separates real signals from statistical noise.

| Dataset | p-value | Effect | Consistency | Causality score | Signal |
|---|---|---|---|---|---|
| Real pattern | < 0.001 | 0.289 | 0.86 | 0.64 | strong |
| Flat metrics | 0.09* | -0.005 | ~0.4 | ~0.2 | none |
| Shuffled | 0.55 | 0.000 | ~0.5 | 0.25 | none |

\* p < 0.05 in some metrics due to large sample size — effect size and consistency correctly identify these as noise.

`CorrelationResult` now includes:
- `consistency` — fraction of events individually showing the pattern (0–1)
- `signal_strength` — `"strong"` / `"moderate"` / `"weak"` / `"none"`
- `causality_score` — composite score: `0.5 × |effect| + 0.5 × consistency` (0–1)

`significant = True` only when `p < 0.05 AND signal_strength in ("strong", "moderate")`.

## How it works

- **Statistical core:** For each metric, values in the pre-event window (default: 48 h before, configurable lag) are compared against a 28-day baseline using Mann-Whitney U. Effect size is computed as rank-biserial correlation.
- **Multiple comparison correction:** When analysing several metrics simultaneously, FDR (Benjamini-Hochberg) correction is applied by default to control false positives. Bonferroni is also available.
- **Alert level:** Corrected active signals (p < 0.05) are counted. 1–2 → green, 3–4 → yellow, 5–7 → red.
- **LLM narration:** Only triggered on yellow or red. The model receives pre-calculated statistics and is constrained to one factual sentence per signal — no diagnosis, no causal inference.

## Use cases

- **Health monitoring** — correlate HRV, deep sleep, or skin temperature drops with migraine or crisis events.
- **Infrastructure** — detect latency or error-rate anomalies preceding service outages.
- **IPTV / streaming** — link buffering load spikes to subscriber disconnection events.
- **Energy consumption** — associate power demand patterns with grid stress or equipment failures.

## License

MIT — Raúl Gallardo (g3v3r)
