Metadata-Version: 2.4
Name: async-runtime-auditor
Version: 0.1.1
Summary: Runtime audit CLI for detecting asyncio degradation and threadpool saturation.
Author: Priyanshu
License: MIT
Project-URL: Homepage, https://github.com/priyanshuphenomenal007/async-runtime-auditor
Project-URL: Repository, https://github.com/priyanshuphenomenal007/async-runtime-auditor
Project-URL: Issues, https://github.com/priyanshuphenomenal007/async-runtime-auditor/issues
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: System :: Monitoring
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31.0
Requires-Dist: PyYAML>=6.0.1
Dynamic: license-file

# Async Runtime Auditor

[![PyPI version](https://badge.fury.io/py/async-runtime-auditor.svg)](https://badge.fury.io/py/async-runtime-auditor)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Runtime degradation detection and CI/CD gating for Python `asyncio` services.

Async systems can enter severe internal degradation states long before traditional application telemetry reflects failure. Event-loop starvation, blocking execution, and executor saturation frequently remain invisible behind healthy HTTP latency metrics.

**Async Runtime Auditor** is a lightweight operational CLI that evaluates async runtime health directly from telemetry signals and enforces deployment gating before degraded runtime behavior reaches production.

The system is designed for:

- staging validation  
- runtime regression detection  
- deployment pipeline enforcement  
- operational diagnostics for async infrastructure  

---

# Core Idea

Most observability systems focus primarily on external request behavior:

- latency  
- throughput  
- error rate  

But async runtimes fail internally first.

This project focuses on runtime-native signals such as:

- event-loop lag  
- executor queue amplification  
- blocking duration  
- concurrent pressure  
- threadpool backlog growth  

to detect runtime degradation before application instability becomes externally visible.

---

# Runtime Failure Pattern

In many production systems:

```text
healthy HTTP latency != healthy runtime state
```

A runtime may still appear externally healthy while internally experiencing:

- scheduler starvation  
- blocking execution  
- executor worker exhaustion  
- queue amplification  
- degraded task scheduling fairness  

The auditor is designed specifically to detect these hidden operational states.

---

# Installation

Requires Python 3.9+

A Prometheus-compatible telemetry backend is required.

## PyPI Installation

```bash
pip install async-runtime-auditor
```

---

## Local Development

```bash
pip install -e .
```

from the repository root.

---

# Runtime Metric Requirements

Your async application must expose runtime telemetry before the auditor can evaluate health conditions.

Example FastAPI instrumentation:

Install Prometheus client support:

```bash
pip install prometheus-client
```

```python
import asyncio
import time

from fastapi import FastAPI
from prometheus_client import Gauge, make_asgi_app

app = FastAPI()

EVENT_LOOP_LAG = Gauge(
    "asyncio_event_loop_lag_seconds",
    "Measured asyncio event loop lag"
)

ACTIVE_REQUESTS = Gauge(
    "active_requests",
    "Current active requests"
)

metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)

async def monitor_loop_lag():
    while True:
        start = time.perf_counter()

        await asyncio.sleep(0.1)

        lag = time.perf_counter() - start - 0.1

        EVENT_LOOP_LAG.set(max(0, lag))

@app.on_event("startup")
async def startup_event():
    asyncio.create_task(monitor_loop_lag())
```

Once Prometheus scrapes these metrics, the auditor can evaluate runtime degradation thresholds.

---

# Quick Start

## 1. Initialize Local Configuration

Generate a default `metrics.yaml` configuration file:

```bash
async-auditor --init
```

---

## 2. Run a Runtime Audit

Execute against a local or staging Prometheus target:

```bash
async-auditor \
  --config metrics.yaml \
  --target http://localhost:9090
```

---

## 3. Enable CI/CD Deployment Gating

Fail deployment pipelines automatically when critical runtime degradation is detected:

```bash
async-auditor \
  --config metrics.yaml \
  --target http://localhost:9090 \
  --fail-on-critical
```

Exit code behavior:

| Runtime Status | Exit Code |
|---|---|
| `HEALTHY` | `0` |
| `DEGRADED` | `0` |
| `CRITICAL` | `1` |
| `COLLAPSE_RISK` | `1` |

---

# Configuration (`metrics.yaml`)

The auditor is intentionally decoupled from metric naming conventions.

Prometheus queries are mapped into the runtime scoring engine through `metrics.yaml`.

```yaml
target: "http://localhost:9090"

queries:
  p99_latency: "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))"

  event_loop_lag: "max_over_time(asyncio_event_loop_lag_seconds[15m])"

  blocking_events_total: "blocking_events_total"

  blocking_duration_avg: "rate(blocking_duration_seconds_sum[15m]) / rate(blocking_duration_seconds_count[15m])"

  peak_active_requests: "max_over_time(active_requests[15m])"

  threadpool_queue_wait: "rate(threadpool_queue_wait_seconds_sum[15m]) / rate(threadpool_queue_wait_seconds_count[15m])"

  threadpool_tasks_started: "threadpool_tasks_started_total"

  threadpool_tasks_completed: "threadpool_tasks_completed_total"

thresholds:
  health_score_collapse: 2500
  health_score_critical: 1200
  health_score_degraded: 400

  max_event_loop_lag_s: 0.5
  max_blocking_duration_s: 1.0

  max_queue_wait_s: 1.0
  max_threadpool_backlog: 3

  max_active_requests: 150
```

---

# Runtime Health Model

The auditor computes a deterministic runtime health score using:

- event-loop lag  
- blocking duration  
- executor queue wait  
- concurrent request pressure  
- threadpool backlog behavior  

The scoring model is:

- deterministic  
- threshold-driven  
- operationally explainable  

This project intentionally avoids:

- anomaly detection systems  
- machine learning classifiers  
- probabilistic runtime forecasting  
- opaque scoring heuristics  

---

# Output Formats

The CLI supports both:

- terminal-readable operational output  
- structured JSON output for CI systems  

---

## Standard Text Output

```bash
async-auditor --output-format text
```

---

## JSON Output

```bash
async-auditor --output-format json
```

A structured report is automatically written to:

```text
audit_results.json
```

inside the current working directory.

---

# Example Runtime Audit Output

```text
============================================================
ASYNC RUNTIME AUDITOR
============================================================

Target: http://localhost:9090

Runtime Status: DEGRADED
Runtime Health Score: 842

Findings:
  - Event-loop starvation detected.
  - Executor queue amplification detected.
  - Concurrent load saturation detected.

JSON audit report written to:
audit_results.json
```

---

# CI/CD Integration

A reference GitHub Actions workflow is included in:

```text
examples/github-actions-gating.yml
```

Example pipeline step:

```yaml
- name: Runtime Audit
  run: |
    async-auditor \
      --config metrics.yaml \
      --target http://localhost:9090 \
      --fail-on-critical
```

This allows runtime degradation checks to execute automatically during pull request validation and deployment workflows.

---

# Intended Usage

This tool is designed primarily for:

- deployment pipeline gating  
- async runtime regression testing  
- staging environment diagnostics  
- operational async runtime analysis  

It is not intended to replace:

- full observability platforms  
- distributed tracing systems  
- telemetry storage infrastructure  

---

# Scope

This project is intentionally narrow in scope.

## Included

- heuristic runtime degradation detection  
- CI/CD exit-code semantics  
- configurable operational thresholds  
- Prometheus-compatible telemetry querying  
- async runtime diagnostics  

---

## Not Included

- distributed tracing infrastructure  
- Kubernetes orchestration  
- OpenTelemetry collectors  
- AI-driven remediation systems  
- autonomous operational control loops  
- observability storage backends  

---

# Design Principles

The project prioritizes:

- deterministic behavior  
- explainable operational output  
- low dependency overhead  
- operational clarity  
- lightweight deployment integration  

over:

- infrastructure breadth  
- platform extensibility  
- autonomous remediation  
- generalized orchestration  

---

# Project Status

Current version:

```text
v0.1.0
```

This project is currently in early-stage OSS development and intended for:

- engineering validation  
- staging infrastructure testing  
- deployment pipeline experimentation  

Operational thresholds may require tuning depending on workload characteristics and runtime architecture.

---

# License

MIT License
