Metadata-Version: 2.4
Name: failure-forensics
Version: 0.1.1
Summary: Production AI pipeline monitoring — root cause detection, anomaly alerts, regression guard
License: MIT License
Project-URL: Homepage, https://github.com/jasstt/failure-forensics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Requires-Dist: python-dotenv
Requires-Dist: google-generativeai
Dynamic: license-file

# Failure Forensics

Production AI pipeline monitoring — root cause detection, anomaly alerts, regression guard, and Gemini-powered recommendations.

## Installation

```bash
pip install failure-forensics
```

## Quick Start

```python
from failure_forensics import trace

@trace(step="retrieval", version="v1")
def my_retrieval_function(query):
    # your code here
    pass
```

## Features 🔬

![Python](https://img.shields.io/badge/Python-3.10+-blue)
![Tests](https://img.shields.io/badge/Tests-8%2F8_PASS-brightgreen)
![License](https://img.shields.io/badge/License-MIT-lightgrey)
![Alerts](https://img.shields.io/badge/Slack-Alerts_Ready-4A154B)

A **self-hosted, zero-cost** LLM pipeline observability tool that gives you root cause detection, anomaly alerts, A/B reporting, and a live terminal dashboard — without sending your data to any third-party service.

---

## 🆚 Why Not LangSmith or Braintrust?

| | **Failure Forensics** | LangSmith | Braintrust |
|---|---|---|---|
| Cost | **Free** | Paid tiers | Paid tiers |
| Data privacy | **Stays on your machine** | Sent to cloud | Sent to cloud |
| Customization | **Full control** | Limited | Limited |
| Slack alerts | **Built-in** | Premium only | Premium only |
| A/B reporting | **Built-in** | Basic | Basic |
| Circuit breaker / trend | **Built-in** | ❌ | ❌ |

**Failure Forensics is designed for teams who need production-grade observability without vendor lock-in.**

---

## ✨ What It Does

Every pipeline run passes through a structured logging and analysis layer:

```
Pipeline Step  →  logger.py  →  requests.jsonl
                                     ↓
                    ┌────────────────┴────────────────┐
                    │                                 │
              forensics.py                       pattern.py
          (root cause detection)          (time series + anomaly)
                    │                                 │
              versioning.py                      baseline.py
           (v1 vs v2 comparison)            (7-day moving average)
                    │                                 │
               ab_report.py                      alerts.py
            (A/B comparison table)          (Slack / console alert)
                    └────────────────┬────────────────┘
                                     ↓
                              dashboard.py
                         (ASCII terminal dashboard)
```

---

## 📁 Project Structure

```
failure-forensics/
├── src/
│   ├── logger.py          # Logs every pipeline step to JSONL
│   ├── forensics.py       # Root cause detection (5 categories)
│   ├── pattern.py         # Time-series failure rate + anomaly detection
│   ├── baseline.py        # 7-day moving average + trend (IMPROVING/STABLE/DEGRADING)
│   ├── alerts.py          # Slack webhook + console alerts
│   ├── versioning.py      # Per-version failure rate stats
│   ├── ab_report.py       # A/B comparison report (table + JSON)
│   └── dashboard.py       # ASCII bar chart terminal dashboard
├── data/
│   └── logs/
│       └── requests.jsonl # All pipeline logs (gitignored)
├── tests/
│   └── test_forensics.py  # 8 unit tests
├── config.py              # Thresholds, Slack URL, step limits
├── main.py                # 5-scenario demo runner
├── simulate.py            # Realistic test data generator (100 runs, anomaly day)
└── requirements.txt
```

---

## 🚀 Getting Started

### 1. Clone & Install

```bash
git clone https://github.com/jasstt/failure-forensics.git
cd failure-forensics
pip install -r requirements.txt
```

### 2. (Optional) Configure Slack Alerts

Edit `config.py`:

```python
SLACK_WEBHOOK_URL = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
```

If left empty, all alerts print to the console.

### 3. Run the Full Demo

```bash
python main.py
```

This runs 5 scenarios:
1. **Simulation** — generates 100 realistic pipeline runs (2 prompt versions, anomaly day)
2. **Root cause analysis** — detects the failing step and assigns a category
3. **7-day pattern report** — failure rate per day + step breakdown + anomaly check
4. **A/B report** — `prompt_v1` vs `prompt_v2` with per-step improvement table
5. **Terminal dashboard** — live ASCII bar charts, trend, top 5 failed runs

### 4. Run Unit Tests

```bash
python tests/test_forensics.py
python tests/test_advanced.py
```

---

## 🚀 Advanced Features (New in v2)

| Katman | Özellik | Teknoloji |
|--------|---------|-----------|
| 1 | Otomatik öneri motoru | Kural tabanlı |
| 2 | AI destekli hata analizi | Gemini 2.5 Pro |
| 3 | Eval seti otomatik büyütme | Frequency analysis |
| 4 | Prompt optimizasyon açıklaması | Gemini 2.5 Pro |
| 5 | Regression guard | Baseline comparison |

**Senaryo 6: Regression Guard**
Yeni bir prompt (`v3`) deploy edilmeden önce otomatik regresyon kontrolü yapar:
```
REGRESSION CHECK — v3
Baseline (v2): 11.0% failure rate
Yeni (v3):     24.5% failure rate
Delta: +13.5pp → REGRESSION_DETECTED ❌
```

## Test Results

| Katman | Test | Sonuç |
|--------|------|-------|
| 1 — Recommender | Kategori → öneri mapping | ✅ PASS |
| 2 — LLM Analyzer | Gemini fallback | ✅ PASS |
| 3 — Eval Collector | Duplicate prevention | ✅ PASS |
| 4 — Prompt Optimizer | A/B açıklama (v2: +10pp) | ✅ PASS |
| 5 — Regression Guard | DETECTED + PASS senaryoları | ✅ PASS |

### Key Results
- A/B: prompt_v2, v1'e göre 10pp iyileşme
- Regression Guard: v3 deploy'u +6pp delta ile WARNING olarak engelledi
- Eval Collector: 5 yeni eval adayı otomatik toplandı
- LLM Analyzer: Gemini kapalıyken kural tabanlına sorunsuz fallback

---

## 📊 Results

| Feature | Result |
|---------|--------|
| Unit Tests | **8/8 PASS** ✅ |
| Root cause categories | **5 types** (RETRIEVAL_QUALITY, RERANKER_FAILURE, LLM_HALLUCINATION, CITATION_MISS, API_ERROR) |
| Anomaly detection | **20% delta threshold** — flags when today's rate exceeds 7-day average by >20pp |
| A/B comparison | **v2: 11.5pp improvement** over v1 (22.5% → 11.0% failure rate) |
| Trend analysis | **IMPROVING / STABLE / DEGRADING** based on 7-day moving average |
| Slack integration | **Webhook ready** — fires on rate threshold, anomaly, or 3 consecutive failures |

---

## ⚙️ Configuration (`config.py`)

| Parameter | Default | Description |
|-----------|---------|-------------|
| `FAILURE_RATE_THRESHOLD` | `0.25` | Alert fires above this failure rate |
| `ANOMALY_THRESHOLD` | `0.20` | Flag if today exceeds 7-day avg by this delta |
| `SLACK_WEBHOOK_URL` | `""` | Empty = console output |
| `CONSECUTIVE_FAILURE_THRESHOLD` | `3` | Alert after N consecutive step failures |
| `STEP_THRESHOLDS` | see config | Per-step max acceptable failure rate |

---

## 🧪 Root Cause Categories

| Category | Trigger |
|----------|---------|
| `RETRIEVAL_QUALITY` | Retrieval step fails — no results, low score |
| `RERANKER_FAILURE` | Reranker can't parse LLM response or times out |
| `LLM_HALLUCINATION` | Generation returns empty or uncited response |
| `CITATION_MISS` | Answer produced but no source citations found |
| `API_ERROR` | Timeout, 429 rate limit, 503 service unavailable |

---

## 📈 Terminal Dashboard (Sample Output)

```
═════════════════════════════════════════════════════════════
  🔬  FAILURE FORENSICS — Terminal Dashboard
═════════════════════════════════════════════════════════════

  📅 SON 7 GÜNÜN FAILURE RATE GRAFİĞİ
  2026-06-03  [███░░░░░░░░░░░░░░░░░░░░░░░░░░░] 13.0%
  2026-06-07  [████████░░░░░░░░░░░░░░░░░░░░░░] 27.3% ⚠️
  2026-06-10  [███░░░░░░░░░░░░░░░░░░░░░░░░░░░] 12.0%

  🔍 ADIM BAZINDA HATA DAĞILIMI
  retrieval     [███████░░░░░░░░░░░░░] 38.0%  (38/100 hatalı)
  reranking     [██░░░░░░░░░░░░░░░░░░] 13.0%  (13/100 hatalı)
  generation    [██░░░░░░░░░░░░░░░░░░] 10.0%  (10/100 hatalı)
  citation      [█░░░░░░░░░░░░░░░░░░░]  6.0%  (6/100 hatalı)

  ⚡ ANOMALİ: ✅ Normal: Bugün (12.0%) ≈ 7g ort. (16.2%)
  📊 TREND: ➡️  STABLE — Hareketli Ort: 16.0%
```

---

## 🛠 Technologies Used

* **Python standard library** — `json`, `collections`, `datetime`, `threading`
* **[requests](https://pypi.org/project/requests/)** — Slack webhook HTTP calls
* **[python-dotenv](https://pypi.org/project/python-dotenv/)** — Environment variable management

No heavy dependencies. No cloud. No API keys required.

---

## 🔭 Roadmap

- [ ] FastAPI REST endpoint for remote log ingestion
- [ ] HTML report export
- [ ] PostgreSQL backend for large-scale log storage
- [ ] Multi-pipeline support (compare RAG vs fine-tuned model)
- [ ] Email alerts as alternative to Slack
