Metadata-Version: 2.4
Name: autosre-ai
Version: 0.2.2
Summary: Open-source AI SRE agent - foundation-first incident investigation, root cause analysis, and auto-remediation
Project-URL: Homepage, https://github.com/opensre/autosre
Project-URL: Documentation, https://github.com/opensre/autosre#readme
Project-URL: Repository, https://github.com/opensre/autosre
Project-URL: Issues, https://github.com/opensre/autosre/issues
Author-email: AutoSRE AI <hello@autosre.ai>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: ai,automation,devops,incident-response,kubernetes,llm,observability,on-call,prometheus,runbooks,sre
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.11
Requires-Dist: aiofiles>=24.0.0
Requires-Dist: alembic>=1.14.0
Requires-Dist: anthropic>=0.18.0
Requires-Dist: asyncpg>=0.30.0
Requires-Dist: atlassian-python-api>=3.41.0
Requires-Dist: boto3>=1.34.0
Requires-Dist: click>=8.1.0
Requires-Dist: datadog-api-client>=2.20.0
Requires-Dist: fastapi>=0.110.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: jinja2>=3.1.0
Requires-Dist: kubernetes>=28.0.0
Requires-Dist: langchain-community>=0.3.0
Requires-Dist: langchain-core>=0.3.0
Requires-Dist: langchain-neo4j>=0.1.0
Requires-Dist: langchain-openai>=0.2.0
Requires-Dist: langchain>=0.3.0
Requires-Dist: langgraph>=1.0.0
Requires-Dist: litellm>=1.50.0
Requires-Dist: neo4j>=5.14.0
Requires-Dist: openai>=1.10.0
Requires-Dist: prometheus-client>=0.25.0
Requires-Dist: pydantic-settings>=2.14.0
Requires-Dist: pydantic>=2.13.3
Requires-Dist: pygithub>=2.1.0
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: python-multipart>=0.0.6
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: slack-sdk>=3.27.0
Requires-Dist: sqlalchemy[asyncio]>=2.0.0
Requires-Dist: sse-starlette>=2.0.0
Requires-Dist: typer>=0.12.0
Requires-Dist: uvicorn[standard]>=0.27.0
Provides-Extra: dev
Requires-Dist: mkdocs-material>=9.0.0; extra == 'dev'
Requires-Dist: mkdocs>=1.5.0; extra == 'dev'
Requires-Dist: mypy>=1.20.2; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0.0; extra == 'dev'
Provides-Extra: minimal
Requires-Dist: httpx>=0.25.0; extra == 'minimal'
Requires-Dist: pydantic>=2.0.0; extra == 'minimal'
Requires-Dist: rich>=13.0.0; extra == 'minimal'
Requires-Dist: typer>=0.12.0; extra == 'minimal'
Provides-Extra: sandbox
Requires-Dist: docker>=7.0.0; extra == 'sandbox'
Description-Content-Type: text/markdown

<p align="center">
  <img src="docs/assets/logo.png" alt="AutoSRE Logo" width="200"/>
</p>

<h1 align="center">AutoSRE</h1>

<p align="center">
  <strong>🤖 The AI SRE that investigates incidents like your best on-call engineer — but faster.</strong>
</p>

<p align="center">
  <a href="https://github.com/autosre-ai/autosre/actions"><img src="https://img.shields.io/github/actions/workflow/status/autosre-ai/autosre/ci.yml?style=flat-square&logo=github" alt="CI Status"></a>
  <a href="https://pypi.org/project/autosre-ai"><img src="https://img.shields.io/pypi/v/autosre-ai?style=flat-square&logo=pypi&logoColor=white" alt="PyPI Version"></a>
  <a href="https://pypi.org/project/autosre-ai"><img src="https://img.shields.io/pypi/pyversions/autosre-ai?style=flat-square&logo=python&logoColor=white" alt="Python Versions"></a>
  <a href="https://github.com/autosre-ai/autosre/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square" alt="License"></a>
  <a href="https://github.com/autosre-ai/autosre"><img src="https://img.shields.io/github/stars/autosre-ai/autosre?style=flat-square&logo=github" alt="Stars"></a>
</p>

<p align="center">
  <a href="#-quick-start">Quick Start</a> •
  <a href="#-features">Features</a> •
  <a href="#-how-it-works">How It Works</a> •
  <a href="#-integrations">Integrations</a> •
  <a href="docs/">Docs</a>
</p>

---

<p align="center">
  <em>45-minute investigations → 5 minutes. Autonomous triage. Evidence-based RCA. Human-in-the-loop for safety.</em>
</p>

<!-- Demo GIF Placeholder -->
<p align="center">
  <img src="docs/assets/demo.gif" alt="AutoSRE Demo" width="700"/>
</p>

---

## ⚡ Quick Start

```bash
# Install
pip install autosre-ai

# Configure (interactive setup)
autosre config init

# Investigate your first incident
autosre investigate "checkout service 500 errors" --service checkout-service
```

Or with Docker:
```bash
docker run -it --rm -v ~/.autosre:/root/.autosre ghcr.io/autosre-ai/autosre investigate "high latency on api-gateway"
```

**That's it.** No Neo4j. No Postgres. No infrastructure. Just `pip install` and go.

---

## ✨ Features

### 🔍 **Autonomous Investigation**
Multi-agent investigation that works like your best SRE: triage → contain → investigate → resolve → learn.

```bash
$ autosre investigate "payment failures spiking"

[Triage] Confirmed: payment-service 5xx rate at 12% (normally <0.1%)
[Scope] Affected: checkout-service, order-service (downstream)
[Hypothesis] Testing: Recent deployment of payment-service v2.3.1
[Evidence] Deployment at 14:02, errors started 14:05 ✓
[Root Cause] payment-service v2.3.1 introduced null pointer in retry logic
[Recommendation] Rollback to v2.3.0 (requires approval)
```

### 🧠 **Episodic Memory**
Learns from every investigation. Recalls similar incidents. Gets smarter over time.

```bash
$ autosre memory search "database timeout"

Found 3 similar incidents:
├── inv_abc123: PostgreSQL connection pool exhaustion (resolved in 8m)
├── inv_def456: Slow query blocking connections (resolved in 12m)
└── inv_ghi789: Network partition to RDS (resolved in 23m)
```

### 📊 **SLO-Driven Operations**
Error budgets, multi-window burn rates, deployment gating — all built-in.

```bash
$ autosre slo status --service checkout-service

checkout-service SLO Status
├── Availability: 99.92% (target: 99.9%) ✓
├── Latency p99: 245ms (target: 300ms) ✓
├── Error Budget: 72% remaining
│   ├── 1h burn rate: 0.8x
│   ├── 6h burn rate: 1.2x
│   └── 24h burn rate: 0.9x
└── Deploys: ALLOWED
```

### 🛡️ **AI Safety Built-In**
Every decision has confidence scores. Critical actions require human approval. Full audit trails.

- **Hypothesis-driven reasoning** with falsifiable criteria
- **Confidence scoring** (0.0-1.0) on every decision
- **Human-in-the-loop** for remediation actions
- **AI error budgets** tracking accuracy over time

### 🔧 **Extensible Skills System**
Modular investigation skills: Kubernetes, metrics, logs, traces, infrastructure.

```
skills/
├── kubernetes/         # Pod states, deployments, events
├── metrics-analysis/   # Prometheus, Datadog, Grafana
├── log-analysis/       # Pattern matching, anomaly detection
├── traces/             # Distributed tracing analysis
├── infrastructure/     # AWS, GCP resource checks
└── investigation/      # Methodology and hypothesis testing
```

### 📝 **Automated Postmortems**
Blameless postmortems with auto-generated timelines, metrics snapshots, and action items.

---

## 🎯 How It Works

```
┌────────────────────────────────────────────────────────────────┐
│                     autosre investigate                         │
└────────────────────────────┬───────────────────────────────────┘
                             │
                    ┌────────▼────────┐
                    │   Orchestrator   │
                    │   (LangGraph)    │
                    └────────┬────────┘
                             │
         ┌───────────┬───────┴───────┬───────────┐
         │           │               │           │
    ┌────▼────┐ ┌────▼────┐   ┌─────▼────┐ ┌────▼────┐
    │ Memory  │ │Topology │   │ Planner  │ │  LLM    │
    │(SQLite) │ │ (YAML)  │   │  Agent   │ │ Router  │
    └─────────┘ └─────────┘   └────┬─────┘ └─────────┘
                                   │
               ┌─────────┬─────────┼─────────┬─────────┐
               │         │         │         │         │
          ┌────▼───┐┌────▼───┐┌────▼───┐┌────▼───┐┌────▼───┐
          │  K8s   ││Metrics ││  Logs  ││ Traces ││ Infra  │
          │Subagent││Subagent││Subagent││Subagent││Subagent│
          └────┬───┘└────┬───┘└────┬───┘└────┬───┘└────┬───┘
               └─────────┴─────────┴─────────┴─────────┘
                                   │
                    ┌──────────────┴──────────────┐
                    │         Synthesizer         │
                    │   (Evidence → Root Cause)   │
                    └──────────────┬──────────────┘
                                   │
                    ┌──────────────▼──────────────┐
                    │     Writeup & Actions       │
                    │  (Postmortem, Remediation)  │
                    └─────────────────────────────┘
```

**Key Concepts:**

| Component | What It Does |
|-----------|--------------|
| **Orchestrator** | Coordinates investigation phases (Triage → Mitigate → Diagnose → Resolve) |
| **Episodic Memory** | SQLite-based learning from past investigations with FTS5 search |
| **Service Topology** | YAML-defined service dependencies for blast radius analysis |
| **Subagents** | Parallel specialists (Kubernetes, metrics, logs, traces) |
| **Synthesizer** | Merges evidence, tests hypotheses, identifies root cause |

---

## 🔌 Integrations

| Category | Supported |
|----------|-----------|
| **Observability** | Prometheus, Grafana, Datadog |
| **Incident Management** | PagerDuty, Slack, OpsGenie |
| **Infrastructure** | Kubernetes, AWS, GCP |
| **Source Control** | GitHub, GitLab |
| **Issue Tracking** | Jira, Linear |

---

## 📈 Why AutoSRE?

| Before AutoSRE | After AutoSRE |
|----------------|---------------|
| 45+ min incident investigations | 5 min AI-assisted triage |
| Lost context between incidents | Episodic memory recalls similar issues |
| Tribal knowledge in runbooks | AI executes and learns from runbooks |
| Manual toil tracking | Auto-classified, automation suggested |
| Blame-filled postmortems | Auto-generated blameless documentation |

**Test Results:** 1,053 tests passing | 25+ investigation scenarios validated

---

## 🏗️ Production Deployment

For production deployments with persistent storage and multiple services, see the [Docker Deployment Guide](docs/deployment/docker.md).

<details>
<summary><strong>Quick Docker Compose Setup</strong></summary>

```bash
# Clone and setup
git clone https://github.com/autosre-ai/autosre.git
cd autosre
make setup

# Configure secrets
cp .env.example .env
vim .env  # Add your API keys

# Start all services
make dev

# Verify health
make health
```

**Services:**
| Service | Port | Description |
|---------|------|-------------|
| web-ui | 3000 | Next.js web interface |
| api-gateway | 8000 | FastAPI REST API |
| sre-agent | 8080 | AI agent service |
| postgres | 5432 | PostgreSQL database |
| neo4j | 7474 | Graph database (optional) |
| redis | 6379 | Cache & pub/sub |

</details>

---

## 🤝 Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

```bash
# Development setup
git clone https://github.com/autosre-ai/autosre.git
cd autosre
pip install -e ".[dev]"
pytest  # Run the test suite
```

**Areas we need help:**
- 🔌 New integrations (Elastic, Splunk, New Relic)
- 📊 Investigation scenarios for evaluation
- 📚 Documentation and examples
- 🐛 Bug reports and fixes

---

## 📄 License

Apache 2.0 — See [LICENSE](LICENSE) for details.

---

<p align="center">
  <strong>Built by SREs, for SREs.</strong><br>
  <sub>Tired of 3am pages? Let AutoSRE handle the first 5 minutes.</sub>
</p>

<p align="center">
  <a href="https://github.com/autosre-ai/autosre">⭐ Star us on GitHub</a> •
  <a href="https://discord.gg/autosre">💬 Join Discord</a> •
  <a href="https://twitter.com/autosre_ai">🐦 Follow on Twitter</a>
</p>
