Metadata-Version: 2.4
Name: sre-beacon
Version: 1.0.0
Summary: SRE Metrics Calculator & Reliability Analyzer — SLO compliance, error budgets, incident analysis, alert auditing, toil estimation, and maturity scoring.
Author: Sanjay Sundar Murthy
License: MIT
Keywords: sre,reliability,slo,devops,observability,incident-management
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pyyaml>=6.0
Dynamic: license-file

# 📡 sre-beacon

**SRE Metrics Calculator & Reliability Analyzer**

A Python CLI tool for Site Reliability Engineering — calculate SLO compliance, error budgets, availability nines, incident metrics (MTTR/MTTA/MTBF), toil estimation, Prometheus alert rule auditing, and SRE maturity assessment.

## Features

| Command | What It Does |
|---------|-------------|
| `sre-beacon slo` | Calculate SLO compliance and error budgets |
| `sre-beacon availability` | Calculate availability percentages and nines |
| `sre-beacon incidents` | Analyze incidents — MTTR, MTTA, MTBF |
| `sre-beacon toil` | Estimate toil hours and automation potential |
| `sre-beacon alerts` | Audit Prometheus/Alertmanager alert rules (15 rules) |
| `sre-beacon maturity` | Assess SRE maturity across 8 dimensions |
| `sre-beacon demo` | Run a full demo with sample data |
| `sre-beacon rules` | List all 15 alert audit rules |

## Installation

```bash
git clone https://github.com/SanjaySundarMurthy/sre-beacon.git
cd sre-beacon
pip install -e .
```

## Quick Start

```bash
# Run the full demo (creates sample data and runs all 6 analyses)
sre-beacon demo

# Run with verbose output
sre-beacon demo -v
```

## Commands

### SLO Compliance

```bash
sre-beacon slo slos.yaml actuals.yaml
sre-beacon slo slos.yaml actuals.yaml -v          # detailed error budgets
sre-beacon slo slos.yaml actuals.yaml -f json -o report.json
```

**slos.yaml:**
```yaml
slos:
  - name: API Availability
    service: api-gateway
    sli_type: availability
    target: 99.9
    window_days: 30
```

**actuals.yaml:**
```yaml
actuals:
  - name: API Availability
    value: 99.85
    burn_rate: 1.5
```

### Availability Calculator

```bash
sre-beacon availability services.yaml
sre-beacon availability services.yaml -v    # nines comparison table
```

**services.yaml:**
```yaml
services:
  - name: api-gateway
    total_minutes: 43200
    downtime_minutes: 45
    period: 30 days
```

### Incident Analysis

```bash
sre-beacon incidents incidents.yaml
sre-beacon incidents incidents.yaml -v       # per-service breakdown
sre-beacon incidents incidents.yaml -f json -o report.json
```

### Toil Estimation

```bash
sre-beacon toil tasks.yaml
sre-beacon toil tasks.yaml -v                # list all tasks
sre-beacon toil tasks.yaml --hours 200       # custom engineering hours
```

### Alert Rule Audit

```bash
sre-beacon alerts alert-rules.yaml
sre-beacon alerts alert-rules.yaml -v              # show all issues
sre-beacon alerts alert-rules.yaml --fail-on critical   # CI/CD gate
sre-beacon alerts alert-rules.yaml -f json -o audit.json
```

### SRE Maturity Assessment

```bash
sre-beacon maturity assessment.yaml
sre-beacon maturity assessment.yaml -v      # findings & recommendations
```

## Alert Audit Rules (15)

| Rule | Severity | Description |
|------|----------|-------------|
| SRE-ALT-001 | HIGH | Alert without severity label |
| SRE-ALT-002 | HIGH | Alert without runbook_url annotation |
| SRE-ALT-003 | MEDIUM | Alert without description annotation |
| SRE-ALT-004 | HIGH | Alert without 'for' duration |
| SRE-ALT-005 | MEDIUM | Critical alert with excessive 'for' (>5m) |
| SRE-ALT-006 | LOW | Alert name doesn't follow CamelCase |
| SRE-ALT-007 | INFO | Complex expression (nested functions) |
| SRE-ALT-008 | MEDIUM | Alert without summary annotation |
| SRE-ALT-009 | MEDIUM | Duplicate alert name |
| SRE-ALT-010 | LOW | Alert without team/owner label |
| SRE-ALT-011 | HIGH | Warning alert configured to page |
| SRE-ALT-012 | CRITICAL | Alert expression is empty |
| SRE-ALT-013 | MEDIUM | Very short 'for' duration (<1m) |
| SRE-ALT-014 | LOW | Missing dashboard_url annotation |
| SRE-ALT-015 | INFO | Large alert group (>15 rules) |

## Maturity Dimensions (8)

1. **SLO Adoption** — SLO coverage, error budget tracking, review cadence
2. **Incident Management** — Process, post-mortems, action items, detection
3. **Monitoring & Observability** — Metrics, logging, tracing, golden signals
4. **Toil Reduction** — Tracking, automation projects, self-service tools
5. **Blameless Culture** — Post-mortems, learning reviews, psychological safety
6. **On-Call Practices** — Rotation, compensation, runbooks, handoff
7. **Capacity Planning** — Monitoring, projections, auto-scaling, load testing
8. **Change Management** — CI/CD, canary deployments, rollback, feature flags

## Export Formats

- **Terminal** — Rich formatted output (default)
- **JSON** — Machine-readable export
- **HTML** — Styled dark-theme report

## CI/CD Integration

```yaml
# GitHub Actions — fail on critical alert issues
- name: Audit Alert Rules
  run: |
    pip install sre-beacon
    sre-beacon alerts ./monitoring/alert-rules.yaml --fail-on critical
```

## Testing

```bash
python -m pytest tests/ -v
# 101 tests, all passing
```

## Architecture

```
sre_beacon/
├── models.py                    # Core data models
├── calculators/
│   ├── slo_calculator.py        # SLO compliance & error budgets
│   ├── availability_calculator.py  # Availability & nines
│   ├── incident_calculator.py   # MTTR, MTTA, MTBF
│   └── toil_calculator.py       # Toil estimation
├── analyzers/
│   ├── alert_analyzer.py        # 15 alert audit rules
│   └── maturity_analyzer.py     # 8-dimension maturity scoring
├── reporters/
│   ├── terminal_reporter.py     # Rich terminal output
│   └── export_reporter.py       # JSON & HTML export
├── cli.py                       # Click CLI
└── demo.py                      # Demo data generator
```

## License

MIT
