Metadata-Version: 2.4
Name: chrono-correlator
Version: 1.2.0
Summary: Statistical correlation between time-series and discrete events with optional LLM narration
Author-email: Raúl Gallardo <g3ov3r@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/Raulcadiz/chrono-correlator
Project-URL: Repository, https://github.com/Raulcadiz/chrono-correlator
Project-URL: Bug Tracker, https://github.com/Raulcadiz/chrono-correlator/issues
Keywords: statistics,time-series,correlation,mann-whitney,llm,health,monitoring
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: scipy>=1.11
Requires-Dist: numpy>=1.24
Provides-Extra: groq
Requires-Dist: groq>=0.4.0; extra == "groq"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20.0; extra == "anthropic"
Provides-Extra: ollama
Requires-Dist: ollama>=0.1.0; extra == "ollama"
Provides-Extra: llm
Requires-Dist: groq>=0.4.0; extra == "llm"
Requires-Dist: anthropic>=0.20.0; extra == "llm"
Requires-Dist: ollama>=0.1.0; extra == "llm"
Provides-Extra: pandas
Requires-Dist: pandas>=1.5; extra == "pandas"
Provides-Extra: all
Requires-Dist: groq>=0.4.0; extra == "all"
Requires-Dist: anthropic>=0.20.0; extra == "all"
Requires-Dist: ollama>=0.1.0; extra == "all"
Requires-Dist: pandas>=1.5; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: pandas>=1.5; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.25; extra == "docs"
Dynamic: license-file

# chrono-correlator

A generic statistical engine that correlates time-series data with discrete events using Mann-Whitney U, and narrates results with an LLM only when p < 0.05.

---
[![Sponsor](https://img.shields.io/badge/Sponsor-g3v3r-ea4aaa?logo=github-sponsors)](https://github.com/sponsors/g3v3r)
[![PyPI](https://img.shields.io/pypi/v/chrono-correlator)](https://pypi.org/project/chrono-correlator/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue)](LICENSE)
---

## Install

```bash
# Core (statistics only — no LLM required)
pip install chrono-correlator

# With specific LLM provider
pip install chrono-correlator[groq]
pip install chrono-correlator[anthropic]
pip install chrono-correlator[ollama]      # local, no API key

# Everything
pip install chrono-correlator[all]
```

## Quick start

```python
from datetime import datetime, timedelta
from chrono_correlator import Event, Metric, evaluate, narrate

base = datetime(2024, 1, 1)

events = [
    Event(timestamp=base + timedelta(days=d), label="migraine")
    for d in [10, 20, 30]
]

timestamps = [base + timedelta(hours=h) for h in range(800)]
values = [55.0] * 800
for day in [10, 20, 30]:
    for h in range(48):
        idx = day * 24 - 48 + h
        if 0 <= idx < 800:
            values[idx] = 28.0

hrv = Metric(name="hrv", timestamps=timestamps, values=values)

report = evaluate(events, [hrv])
print(f"Level: {report.level} — {report.active_signals}/{report.total_signals} signals")

if report.level != "green":
    report = narrate(report, provider="groq")
    print(report.narrative)
```

## From a pandas DataFrame

```python
import pandas as pd
from chrono_correlator import Metric

df = pd.read_csv("hrv_data.csv")   # columns: timestamp, value
hrv = Metric.from_dataframe(df, name="hrv", timestamp_col="timestamp", value_col="value")
```

## Lag sweep — find the best anticipatory window automatically

```python
from chrono_correlator import find_best_lag

results = find_best_lag(events, hrv_metric, lag_range=range(0, 72, 6))

best = max(results, key=lambda k: results[k].association_strength)
print(f"Strongest signal at lag={best}h — association_strength={results[best].association_strength:.2f}")
```

## Bootstrap confidence interval for effect size

```python
report = evaluate(events, [hrv], bootstrap_ci=True)   # ~1s per metric
r = report.results[0]
print(f"Effect: {r.effect_size:.3f}  95% CI: [{r.effect_ci[0]:.3f}, {r.effect_ci[1]:.3f}]")
```

If the CI excludes 0, the effect is unlikely to be sampling noise.

## Seasonal baseline correction

```python
# Compare pre-event window only against same day of the week in the baseline
# Eliminates false positives caused by weekly patterns (e.g. traffic every Friday)
report = evaluate(events, metrics, baseline_strategy="same_weekday")

# Compare against same hour of the day — for circadian metrics (HRV, temperature)
report = evaluate(events, metrics, baseline_strategy="same_hour")
```

## Directional analysis

```python
# Only flag metrics that DROP before events (e.g. HRV decrease before migraine)
report = evaluate(events, metrics, direction="decrease")

# Only flag metrics that RISE before events (e.g. heart rate spike before incident)
report = evaluate(events, metrics, direction="increase")
```

## Custom significance thresholds

```python
from chrono_correlator import SignificanceConfig

cfg = SignificanceConfig(alpha=0.01, strong_effect=0.35, strong_consistency=0.75)
report = evaluate(events, metrics, config=cfg)
```

## Overlapping event windows

When two events are closer together than `lookback_hours`, `evaluate()` emits a `UserWarning` automatically:

```
UserWarning: Events 'migraine' (2024-01-10) and 'migraine' (2024-01-11) are 24h apart —
pre-event windows overlap (lookback=48h). Pooled results may be inflated.
```

## Persistence — save and reload reports

```python
from chrono_correlator import save_report, load_reports
from datetime import datetime, timedelta

# Save to SQLite (stdlib, no extra dependencies)
row_id = save_report(report, db_path="chrono.db")

# Load all reports
history = load_reports("chrono.db")

# Filter by level or time window
alerts = load_reports("chrono.db", level="red")
recent  = load_reports("chrono.db", since=datetime.now() - timedelta(days=7))
```

## Export to HTML and Markdown

```python
from chrono_correlator import export_html, export_markdown

export_html(report, "report.html")         # self-contained HTML with table + narratives
export_markdown(report, "report.md")       # GitHub-ready Markdown — paste into issues/PRs
```

## LLM narration with audit trail

```python
# Every LLM call is logged to a JSONL file: stats + prompt + response
# Required for audits in regulated environments (health, industry)
report = narrate(report, provider="groq", audit_log="audit.jsonl")
```

Each audit entry:
```json
{
  "ts": "2024-06-01T14:23:11",
  "metric": "hrv",
  "stats": {"p_value": 0.003, "effect_size": -0.41, "association_strength": 0.68, ...},
  "prompt": "Datos estadísticos CALCULADOS...",
  "response": "Patrón detectado en HRV antes del evento."
}
```

## Multi-domain narration presets

Four built-in language/domain presets — pass the key to `narrate()`, `BaseNarrator`, or the CLI:

| Key | Language | Designed for |
|---|---|---|
| `default` | Spanish | Health / wearables |
| `finance` | English | Trading signals, financial time-series |
| `it` | English | Infrastructure anomalies, incident management |
| `science` | English | Research data, academic reporting |

```python
# Finance — one English sentence, forbidden: buy / sell / predicts
report = narrate(report, provider="groq", prompt_template="finance")

# IT ops — flags anomaly / pre-incident signal language
report = narrate(report, provider="anthropic", prompt_template="it")

# Science — association observed / temporal correlation language
report = narrate(report, provider="groq", prompt_template="science")

# Custom domain — raw format string with any of the available placeholders:
# {metric_name} {baseline_median} {pre_event_median} {p_value}
# {effect_size} {consistency} {association_strength} {signal_strength}
report = narrate(
    report, provider="groq",
    prompt_template="Metric {metric_name}: p={p_value:.4f}, effect={effect_size:.3f}. Write one factual sentence.",
)
```

Set a default per narrator instance and override per call:

```python
narrator = GroqNarrator(prompt_template="it")   # instance default
narrator.narrate(report)                         # uses "it"
narrator.narrate(report, prompt_template="science")  # overrides to "science"
```

CLI:

```bash
chrono analyze metrics.csv events.csv --narrate --provider groq --prompt-template finance
chrono analyze metrics.csv events.csv --narrate --provider anthropic --prompt-template it
```

`PROMPT_TEMPLATES` is also exported from the package for direct access or extension:

```python
from chrono_correlator import PROMPT_TEMPLATES

# Inspect or extend
print(PROMPT_TEMPLATES["finance"])
PROMPT_TEMPLATES["my_domain"] = "Métrica {metric_name}: p={p_value:.4f}. Una frase."
```

## Continuous monitoring (no events needed)

> **Statistical note:** `monitor()` uses a rolling self-comparison: the current window is
> compared against the preceding `baseline_days` period with no discrete event anchor.
> Statistical assumptions differ from `evaluate()` — results reflect distributional drift,
> not pre-event patterns. Calibrate with real events first using `evaluate()` before
> relying on `monitor()` alerts.

```python
from chrono_correlator import monitor, loop

# Single evaluation at now()
report = monitor(metrics, narrate=False)

# Infinite loop — calls on_alert when level is yellow or red
def alert_handler(report):
    save_report(report)
    export_html(report, f"alert_{datetime.now():%Y%m%d_%H%M}.html")

loop(metrics_fn=lambda: metrics, interval_seconds=3600, on_alert=alert_handler)
```

## CLI

```bash
chrono analyze metrics.csv events.csv --name hrv --correction fdr
chrono analyze metrics.csv events.csv --json
chrono analyze metrics.csv events.csv --direction decrease --baseline-strategy same_weekday
chrono analyze metrics.csv events.csv --narrate --provider anthropic
```

## Custom LLM provider

```python
from chrono_correlator import BaseNarrator

class MyNarrator(BaseNarrator):
    def generate(self, prompt: str) -> str:
        # call any local or remote model
        ...

report = MyNarrator().narrate(report)
```

## Adapter recipes — connect live sources without built-in connectors

### Prometheus

```python
import requests
from datetime import datetime, timedelta
from chrono_correlator import Metric

def prometheus_metric(query: str, url: str = "http://localhost:9090") -> Metric:
    end = datetime.now()
    start = end - timedelta(days=35)
    r = requests.get(f"{url}/api/v1/query_range", params={
        "query": query, "start": start.timestamp(),
        "end": end.timestamp(), "step": "1h",
    })
    data = r.json()["data"]["result"][0]["values"]
    return Metric(
        name=query,
        timestamps=[datetime.fromtimestamp(float(t)) for t, _ in data],
        values=[float(v) for _, v in data],
    )

cpu = prometheus_metric("rate(node_cpu_seconds_total[5m])")
report = evaluate(events, [cpu])
```

### InfluxDB

```python
from influxdb_client import InfluxDBClient
from chrono_correlator import Metric

def influx_metric(bucket: str, measurement: str, field: str, url: str, token: str) -> Metric:
    client = InfluxDBClient(url=url, token=token, org="my-org")
    query = f'from(bucket:"{bucket}") |> range(start:-35d) |> filter(fn:(r) => r._measurement == "{measurement}" and r._field == "{field}")'
    tables = client.query_api().query(query)
    rows = [(r.get_time(), r.get_value()) for table in tables for r in table.records]
    return Metric(name=field, timestamps=[t for t, _ in rows], values=[v for _, v in rows])
```

### Watching a live CSV file

```python
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from chrono_correlator import Metric
import pandas as pd

class CsvWatcher(FileSystemEventHandler):
    def __init__(self, path: str, name: str, on_update):
        self.path, self.name, self.on_update = path, name, on_update

    def on_modified(self, event):
        if event.src_path == self.path:
            df = pd.read_csv(self.path)
            metric = Metric.from_dataframe(df, name=self.name)
            self.on_update(metric)
```

### Generic REST API

```python
import requests
from chrono_correlator import Metric

def api_metric(url: str, name: str, ts_field="timestamp", val_field="value") -> Metric:
    data = requests.get(url).json()
    return Metric(
        name=name,
        timestamps=[datetime.fromisoformat(row[ts_field]) for row in data],
        values=[float(row[val_field]) for row in data],
    )
```

## Interactive notebook

[`examples/dashboard.ipynb`](examples/dashboard.ipynb) — full pipeline with matplotlib visualizations, lag sweep chart, and bootstrap CI plot. No UI server required.

## Key finding: p-value alone is not enough

Statistical significance (p < 0.05) can appear in large samples even with no real pattern.
**Effect size + consistency** is what separates real signals from statistical noise.

| Dataset | p-value | Effect | Consistency | Association strength | Signal |
|---|---|---|---|---|---|
| Real pattern | < 0.001 | 0.289 | 0.86 | 0.64 | strong |
| Flat metrics | 0.09* | -0.005 | ~0.4 | ~0.2 | none |
| Shuffled | 0.55 | 0.000 | ~0.5 | 0.25 | none |

\* p < 0.05 in some metrics due to large sample size — effect size and consistency correctly identify these as noise.

`CorrelationResult` includes:
- `consistency` — fraction of events individually showing the pattern (0–1)
- `signal_strength` — `"strong"` / `"moderate"` / `"weak"` / `"none"`
- `association_strength` — composite score: `0.5 × |effect| + 0.5 × consistency` (0–1)
- `effect_ci` — 95% bootstrap confidence interval `(low, high)` when `bootstrap_ci=True`

`significant = True` only when `p < alpha AND signal_strength in ("strong", "moderate")`.

## How it works

- **Statistical core:** For each metric, values in the pre-event window (default: 48 h before, configurable lag) are compared against a 28-day baseline using Mann-Whitney U. Effect size is computed as rank-biserial correlation.
- **Multiple comparison correction:** When analysing several metrics simultaneously, FDR (Benjamini-Hochberg) correction is applied by default to control false positives. Bonferroni is also available.
- **Alert level:** Corrected active signals are counted. 1–2 → green, 3–4 → yellow, 5–7 → red.
- **LLM narration:** Only triggered on yellow or red. The model receives pre-calculated statistics and is constrained to one factual sentence per signal — no diagnosis, no causal inference.

## Use cases

- **Health monitoring** — correlate HRV, deep sleep, or skin temperature drops with migraine or crisis events.
- **Infrastructure** — detect latency or error-rate anomalies preceding service outages.
- **IPTV / streaming** — link buffering load spikes to subscriber disconnection events.
- **Energy consumption** — associate power demand patterns with grid stress or equipment failures.
- **Finance** — find pre-event signals in volume, volatility, or spread data before earnings or market events.

## License

Apache 2.0 — © 2026 Raúl Gallardo (g3v3r)

Free to use in personal and commercial projects.
Attribution required: keep the copyright notice.
See [LICENSE](LICENSE) for full terms.
