Metadata-Version: 2.4
Name: sediment
Version: 0.1.1
Summary: Mine behavioral invariants from production logs and auto-generate tests
Project-URL: Homepage, https://github.com/sediment-py/sediment
Project-URL: Repository, https://github.com/sediment-py/sediment
Project-URL: Bug Tracker, https://github.com/sediment-py/sediment/issues
Author: Sediment Contributors
License: MIT
License-File: LICENSE
Keywords: invariants,llm,logs,mlops,observability,testing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: System :: Logging
Requires-Python: >=3.9
Provides-Extra: avro
Requires-Dist: fastavro>=1.8; extra == 'avro'
Provides-Extra: cloud
Requires-Dist: azure-storage-blob>=12.0; extra == 'cloud'
Requires-Dist: boto3>=1.26; extra == 'cloud'
Requires-Dist: google-cloud-storage>=2.0; extra == 'cloud'
Provides-Extra: config
Requires-Dist: pyyaml>=6.0; extra == 'config'
Provides-Extra: dev
Requires-Dist: pyarrow>=14.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: pyyaml>=6.0; extra == 'dev'
Provides-Extra: full
Requires-Dist: boto3>=1.26; extra == 'full'
Requires-Dist: fastavro>=1.8; extra == 'full'
Requires-Dist: openai>=1.0; extra == 'full'
Requires-Dist: pandas>=2.0; extra == 'full'
Requires-Dist: pyarrow>=14.0; extra == 'full'
Requires-Dist: pyyaml>=6.0; extra == 'full'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Provides-Extra: parquet
Requires-Dist: pyarrow>=14.0; extra == 'parquet'
Provides-Extra: sentence-transformers
Requires-Dist: sentence-transformers>=2.0; extra == 'sentence-transformers'
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://raw.githubusercontent.com/sediment-py/sediment/main/docs/logo.svg" width="72" alt="Sediment logo" />
</p>

<h1 align="center">Sediment</h1>

<p align="center">
  <strong>Mine behavioral invariants from LLM production logs. Auto-generate tests.</strong>
</p>

<p align="center">
  <a href="https://pypi.org/project/sediment"><img src="https://img.shields.io/pypi/v/sediment?color=3182ce&label=PyPI" alt="PyPI version" /></a>
  <a href="https://pypi.org/project/sediment"><img src="https://img.shields.io/pypi/pyversions/sediment?color=3182ce" alt="Python versions" /></a>
  <img src="https://img.shields.io/badge/tests-210%20passing-38a169" alt="210 tests passing" />
  <img src="https://img.shields.io/badge/dependencies-zero-ed8936" alt="zero required dependencies" />
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-718096" alt="MIT license" /></a>
</p>

<p align="center">
  <a href="#quickstart">Quickstart</a> ·
  <a href="#cli">CLI</a> ·
  <a href="#invariant-types">Invariant types</a> ·
  <a href="#staleness-tracking--ci">CI integration</a> ·
  <a href="#supported-formats">Formats</a> ·
  <a href="#api-reference">API</a>
</p>

---

Sediment reads your production logs, discovers what your LLM system *actually does* (not what you think it does), and turns those discoveries into runnable pytest tests and CI checks.

```bash
pip install sediment
sediment discover logs/prod.jsonl
```

```
Discovered 14 invariants from logs/prod.jsonl

[structural]  output_never_empty          confidence=100%  support=2841
[structural]  output_always_json          confidence=98%   support=2784
[pattern]     no_email_in_output          confidence=100%  support=2841  ← PII guard
[pattern]     no_credit_card_in_output    confidence=100%  support=2841  ← PII guard
[statistical] latency_p95_threshold       confidence=94%   support=2672  p95=1240ms
[temporal]    output_length_drift         confidence=91%   support=2841
[semantic]    semantic_consistency        confidence=87%   support=2841
...
```

---

## What it does

| Step | Description |
|------|-------------|
| **Ingest** | Reads logs in any format — JSONL, CSV, Parquet, gzip, OpenAI, LangSmith, OTel, and more |
| **Infer** | Auto-detects format and field schema (input, output, latency, model, session, …) |
| **Discover** | Mines behavioral invariants across 7 miner types |
| **Generate** | Writes a pytest test file you can drop straight into CI |
| **Track** | Saves a baseline and alerts when production behavior drifts |

---

## Install

```bash
pip install sediment                          # core — zero required dependencies
pip install "sediment[parquet]"               # + Parquet / Arrow support
pip install "sediment[avro]"                  # + Avro support
pip install "sediment[cloud]"                 # + S3 / GCS / Azure Blob sources
pip install "sediment[openai]"                # + OpenAI embedding backend
pip install "sediment[sentence-transformers]" # + sentence-transformers backend
pip install "sediment[config]"                # + .sediment.yml config file support
pip install "sediment[full]"                  # everything
```

---

## Quickstart

### Python API

```python
from sediment import LogAnalyzer

a = LogAnalyzer("logs/prod.jsonl")

# Inspect what was detected
print(a.summary())

# Discover invariants
invariants = a.discover(min_confidence=0.8)
for inv in invariants:
    print(inv)

# Generate a pytest test file
a.emit_tests("test_invariants.py", function_hint="call_llm")
# → Run with: pytest test_invariants.py -v

# Generate an interactive HTML report
a.report("report.html")
```

### CLI

```bash
# Explore what's in your logs
sediment summary logs/prod.jsonl

# Discover and print invariants
sediment discover logs/prod.jsonl --min-confidence 0.8

# Save a baseline for future staleness checks
sediment save logs/prod.jsonl baseline.json

# Check for drift against a new batch of logs
sediment check-staleness logs/today.jsonl baseline.json

# Compare invariants between two log snapshots
sediment compare logs/v1.jsonl logs/v2.jsonl

# Generate an HTML report
sediment report logs/prod.jsonl -o report.html

# Scaffold a .sediment.yml config and first baseline
sediment init logs/prod.jsonl
```

---

## Invariant types

Sediment runs 7 miner types in parallel. Each produces typed, confidence-annotated invariants.

### Structural
What your outputs always look like:
- `output_never_empty` — output is always non-null, non-empty
- `output_length_range` — character length stays within observed bounds
- `output_always_json` — every output is valid JSON
- `output_json_keys_consistent` — JSON outputs always contain the same keys
- `output_type_consistent` — output type (str / list / dict) is stable

### Statistical
Distributional properties of your system:
- `latency_p95_threshold` — p95 latency stays under threshold
- `cost_p95_threshold` — p95 cost per request stays under threshold
- `error_rate` — error rate at or below observed baseline
- `model_consistency` — a single model is used throughout

### Pattern — PII & safety
What must never appear in outputs:
- `no_email_in_output` — no email addresses leaked &nbsp;🔴 **critical**
- `no_phone_us_in_output` — no US phone numbers leaked &nbsp;🔴 **critical**
- `no_ssn_in_output` — no Social Security numbers leaked &nbsp;🔴 **critical**
- `no_credit_card_in_output` — no credit card numbers leaked (Luhn-validated) &nbsp;🔴 **critical**
- `no_ipv4_in_output` — no IP addresses leaked &nbsp;🔴 **critical**

> PII detection uses validated regex — SSNs checked against SSA rules, credit cards validated with the Luhn algorithm, phone numbers validated against NANP rules.

### Relational
Input → output relationships:
- `output_minimum_length` — outputs stay above a safe minimum length relative to input
- `refusal_rate` — model refuses or apologises within observed bounds
- `input_output_length_correlation` — longer inputs produce longer outputs (when expected)

### Semantic
Meaning-level consistency:
- `semantic_consistency` — outputs remain semantically similar to baseline
- `semantic_outliers` — no outputs diverge more than 2σ from the centroid
- `near_duplicate_outputs` — outputs are not near-identical (flags stuck / looping models)

### Temporal
Drift over time:
- `output_length_drift` — output length distribution hasn't shifted
- `latency_drift` — latency distribution hasn't shifted
- `model_drift` — model hasn't silently changed
- `error_rate_drift` — error rate hasn't crept up

### Session
Multi-turn conversation patterns:
- `session_turn_count_range` — turns per session stays within expected range
- `session_avg_turns` — average session length is stable
- `session_user_return_rate` — returning user rate is stable

---

## Staleness tracking & CI

### Save a baseline once

```bash
sediment save logs/prod.jsonl .sediment-baseline.json
```

### Check daily in CI

```bash
sediment check-staleness logs/today.jsonl .sediment-baseline.json
# exits 1 if any invariants are violated
```

```
Staleness Report — checked 2024-03-15 09:00 UTC
Original discovery: 2024-03-01  source: logs/prod.jsonl

  ✓ Holds:    11/14
  ↓ Degraded:  2/14   (confidence dropped > 10pp)
  ✗ Violated:  1/14   (confidence dropped > 30pp)  ← CI fails here
  ? Missing:   0/14
```

### GitHub Actions

```yaml
# .github/workflows/sediment.yml
- name: Check invariant staleness
  run: sediment check-staleness ${{ env.LOG_SOURCE }} .sediment-baseline.json
```

### pytest plugin

Collect `*.sediment.json` baselines as native pytest test items:

```bash
pytest --sediment-source=logs/today.jsonl
```

Each invariant becomes a separate test. Violated invariants fail; degraded ones warn.

### Compare two releases

```bash
sediment compare logs/v1.jsonl logs/v2.jsonl
```

```
Sediment Compare: logs/v1.jsonl  →  logs/v2.jsonl
────────────────────────────────────────────────────────────
  New:        2   invariants appeared
  Removed:    0   invariants disappeared
  Improved:   3   confidence increased ≥5%
  Degraded:   1   confidence decreased ≥5%
  Stable:    10   no meaningful change

⚠️  DEGRADED  latency_p95_threshold  87% (-8%)
✅ No regressions detected.
```

---

## Supported formats

| Format | Auto-detected | Notes |
|--------|:------------:|-------|
| JSONL / NDJSON | ✅ | Streaming, nested field paths |
| JSON array | ✅ | `[{…}, {…}]` |
| CSV / TSV | ✅ | Any delimiter, quoted fields |
| logfmt | ✅ | `key=value key="quoted value"` |
| Apache / nginx | ✅ | Combined log format |
| Parquet | ✅ | Requires `pyarrow` |
| Avro | ✅ | Requires `fastavro` |
| gzip | ✅ | `.jsonl.gz`, `.csv.gz`, etc. |
| **OpenAI** API logs | ✅ | Auto-detected |
| **LangSmith** traces | ✅ | Auto-detected |
| **LangFuse** generations | ✅ | Auto-detected |
| **OpenTelemetry** GenAI | ✅ | Auto-detected |
| **Helicone** | ✅ | Auto-detected |
| **W&B Weave** | ✅ | Auto-detected |
| **MLflow** traces | ✅ | Auto-detected |
| **Datadog** LLM Obs | ✅ | Auto-detected |
| S3 / GCS / Azure Blob | ✅ | Requires `sediment[cloud]` |
| stdin | ✅ | `sediment discover -` |

Glob patterns, directories, and cloud URIs all work:

```python
LogAnalyzer("logs/*.jsonl.gz")
LogAnalyzer("logs/")
LogAnalyzer("s3://my-bucket/logs/*.jsonl")
LogAnalyzer("-")   # stdin
```

---

## Sampling

For large log files:

```python
LogAnalyzer("huge.jsonl", sample=10_000, sampling_strategy="importance")
```

| Strategy | Description |
|----------|-------------|
| `random` | Uniform random sample (default) |
| `stratified` | Preserves output-length distribution |
| `importance` | Oversamples rare / anomalous entries |
| `time_windowed` | Weights recent entries higher |

---

## Configuration

Create `.sediment.yml` in your project root (or run `sediment init logs/prod.jsonl`):

```yaml
# .sediment.yml
min_confidence: 0.8
min_support: 2
baseline: .sediment-baseline.json

types:
  - structural
  - statistical
  - pattern
  - relational
  - semantic
  - temporal
  - session

# sample: 10000
# sampling_strategy: random

report:
  format: html
  output: sediment_report.html
```

All CLI commands pick this up automatically.

---

## Custom miners

Register your own miner function to discover domain-specific invariants:

```python
from sediment.discovery.base import InvariantResult

def apology_rate_miner(entries):
    count = sum(1 for e in entries if "sorry" in str(e.output).lower())
    rate  = count / len(entries)
    return [InvariantResult(
        id="apology_rate",
        type="custom",
        description=f"Model apologises in {rate:.0%} of responses",
        confidence=1.0 - rate,
        support=count,
        total=len(entries),
        severity="warning" if rate > 0.1 else "info",
    )]

results = LogAnalyzer("logs.jsonl").register_miner(apology_rate_miner).discover()
```

---

## Embedding backends

Used by the semantic miner. Swap for better accuracy:

```python
from sediment.embeddings.openai_emb import OpenAIEmbedder

a = LogAnalyzer("logs.jsonl")
results = a.discover(embedder=OpenAIEmbedder(api_key="sk-..."))
```

| Backend | Class | Quality | Install |
|---------|-------|---------|---------|
| TF-IDF | `TfidfEmbedder` | Basic | built-in |
| OpenAI `text-embedding-3-small` | `OpenAIEmbedder` | High | `sediment[openai]` |
| `all-MiniLM-L6-v2` | `SentenceTransformerEmbedder` | High | `sediment[sentence-transformers]` |

---

## Schema evolution detection

Detects when field names change mid-stream — e.g. a deploy that renamed `prompt` → `input`:

```python
drifts = LogAnalyzer("logs.jsonl").check_schema_evolution()
for d in drifts:
    print(d)
# [SCHEMA DRIFT] input: 'prompt' → 'input'  (around entry 5000, early=94% late=97%)
```

---

## Jupyter

```python
a = LogAnalyzer("logs.jsonl")
a.show()   # renders interactive HTML report inline
```

---

## API reference

```python
LogAnalyzer(
    source,                      # file, glob, directory, s3://, gs://, az://, or "-"
    schema=None,                 # override inferred schema
    sample=None,                 # max entries to load
    sampling_strategy="random",  # random | stratified | importance | time_windowed
    format_hint=None,            # skip auto-detection
)

# Exploration
.summary()                        → Summary
.infer()                          → SchemaMap
.entries()                        → Iterator[LogEntry]
async .async_entries()            → AsyncIterator[LogEntry]

# Discovery
.discover(
    min_confidence=0.8,
    min_support=2,
    types=None,                  # list of miner type strings, or None for all
    dedup=True,
    embedder=None,
)                                 → list[InvariantResult]

# Output
.emit_tests(output_path, min_confidence=0.8, function_hint="my_function")
.report(output_path, fmt="html", min_confidence=0.5)
.show(min_confidence=0.5)        # Jupyter inline display

# Staleness
.save_invariants(path, min_confidence=0.8)
.check_staleness(invariants_path) → StalenessReport
.check_schema_evolution()         → list[SchemaDrift]

# Extension
.register_miner(fn)               → LogAnalyzer  (chainable)
```

---

## Development

```bash
git clone https://github.com/sediment-py/sediment
cd sediment
pip install -e ".[dev]"
pytest tests/ -v
```

**210 tests · zero required dependencies · Python 3.9+**

---

## License

MIT © Sediment Contributors
