Metadata-Version: 2.4
Name: opsfabric-discovery
Version: 0.3.1
Summary: Read-only AWS reliability audit. Alarm coverage assessment for ECS, Lambda, RDS, Aurora, and SQS.
Project-URL: Homepage, https://opsfabric.ai
Project-URL: Repository, https://github.com/OpsFabric/opsfabric-discovery-pkg
Project-URL: Issues, https://github.com/OpsFabric/opsfabric-discovery-pkg/issues
Project-URL: Changelog, https://github.com/OpsFabric/opsfabric-discovery-pkg/blob/main/CHANGELOG.md
Author: Vaishal Shah
License: MIT
License-File: LICENSE
Keywords: alarms,audit,aws,cloudwatch,observability,ops,reliability,sre
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.11
Requires-Dist: boto3>=1.34.0
Requires-Dist: jinja2>=3.1.4
Requires-Dist: python-dotenv>=1.0.1
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: weasyprint>=62
Description-Content-Type: text/markdown

# opsfabric-discovery

[![PyPI](https://img.shields.io/pypi/v/opsfabric-discovery)](https://pypi.org/project/opsfabric-discovery/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python](https://img.shields.io/pypi/pyversions/opsfabric-discovery)](https://pypi.org/project/opsfabric-discovery/)
[![Downloads](https://static.pepy.tech/badge/opsfabric-discovery/month)](https://pepy.tech/project/opsfabric-discovery)

**Open-source AWS reliability audit.** Produces a 3-page executive PDF assessing CloudWatch alarm coverage across ECS, Lambda, RDS, Aurora, and SQS workloads. Runs locally — your data never leaves your laptop.

> DiscoveryFabric is the audit layer of the [OpsFabric](https://opsfabric.ai) reliability platform. The closed-source companion products — **AlarmFabric** (alarm remediation) and **OpsFabric** (incident response orchestration) — turn the audit findings into production-fixing automations. See [`docs/comparison.md`](docs/comparison.md) for the full feature matrix.

## See what your audit would look like — no AWS needed

[**Download a sample report (PDF, ~68 KB)**](docs/sample-audit.pdf)

Or run it yourself in 30 seconds:

```bash
pip install opsfabric-discovery
opsfabric-discovery audit --demo
# → out/audit-demo.pdf
```

`--demo` runs against a baked-in synthetic dataset that exercises every feature of the audit (DEGRADED alarm detection, ALB→ECS bridge, critical-gap cards, coverage breakdown). No AWS calls, no credentials needed. Same matching engine, same PDF — only the input is fake.

## What it does

- Discovers AWS resources via Resource Explorer 2 across one or all enabled regions.
- Maps CloudWatch alarms to those resources using a five-strategy matcher (exact dimensions, ALB target-group bridge for ECS, namespace + partial dimensions, log-group → metric-filter linkage, naming heuristic).
- Detects alarms that exist but won't notify (actions disabled / no SNS target / `INSUFFICIENT_DATA`) and surfaces them as DEGRADED — they don't count toward coverage.
- Scores required-check coverage against the OpsFabric reliability baseline.
- Renders an executive PDF (3 pages) plus JSON appendices.

## Trust statement

- **Read-only.** Calls only AWS `describe` / `list` APIs. Never creates, modifies, or deletes any resource.
- **Runs on your laptop.** No telemetry, no phone-home. Your data never leaves your machine.
- **Source is auditable.** Open the installed Python files — every AWS call is visible in `discovery_fabric/aws/`.
- **Minimum IAM permissions** (read-only across the board): `sts:GetCallerIdentity`, `ec2:DescribeRegions`, `resource-explorer-2:ListViews` / `GetView` / `Search`, `tag:GetResources`, `cloudwatch:DescribeAlarms`, `logs:DescribeLogGroups` / `DescribeMetricFilters`, `ecs:ListClusters` / `ListServices` / `DescribeServices` / `DescribeTaskDefinition`, `lambda:ListFunctions` / `GetFunction`, `rds:DescribeDBInstances` / `DescribeDBClusters`, `sqs:ListQueues` / `GetQueueAttributes`.

## Install

```bash
pip install opsfabric-discovery
opsfabric-discovery --help
```

## Quickstart

Once installed, from any directory:

```bash
# Audit a profile from ~/.aws/credentials
opsfabric-discovery audit --profile prod --regions all --account-alias acme-prod

# Or via STS assume-role (cross-account)
opsfabric-discovery audit \
  --assume-role-arn arn:aws:iam::CUSTOMER_ACCOUNT:role/OpsFabricAuditor \
  --external-id agreed-secret \
  --regions all \
  --account-alias acme-prod

# Show OpsFabric product context (DiscoveryFabric / AlarmFabric / OpsFabric)
opsfabric-discovery --about

# Outputs land in ./out/ by default; override with --output-dir
ls out/
# audit-<account-id>-<YYYYMMDD>.pdf
# alarm-coverage-score.json
# alarm-coverage-missing.json
# resource-mapping.json
# all-resources.json
# audit-meta.json
```

## Closing the gaps

DiscoveryFabric tells you *what's missing*. To **close** the gaps, you have two paths:

1. **DIY** — open the audit PDF, click through to the AWS Console, author each missing CloudWatch alarm by hand. Free, but tedious; a mid-market fleet usually has 30–200 missing alarms.
2. **AlarmFabric** — the OpsFabric remediation product. Reads this audit's JSON output, generates the alarms in your account via the same read-only role used for the audit (plus `cloudwatch:PutMetricAlarm`), and tags each alarm with its provenance for easy rollback. Typical turnaround: under one engineering day for a fleet of any size. Closed-source SaaS — [opsfabric.ai](https://opsfabric.ai) or email <vaishal2611@gmail.com>.

Once the alarms start firing, [**OpsFabric**](https://opsfabric.ai) handles the incident lifecycle: triage from Slack/CloudWatch/Jira, automated RCA, remediation suggestions, Confluence post-mortems, ticket close-out. Also closed-source SaaS.

The OSS audit is genuinely useful on its own. The paid products are a different layer — not a crippled version of the audit, just a different part of the reliability loop.

## Open source vs commercial — feature matrix

| Capability | DiscoveryFabric (OSS) | AlarmFabric (paid) | OpsFabric (paid) |
|---|:---:|:---:|:---:|
| Read-only audit | ✅ | ✅ | ✅ |
| Resource discovery (ECS / Lambda / RDS / Aurora / SQS) | ✅ | ✅ | ✅ |
| Five-strategy alarm matching | ✅ | ✅ | ✅ |
| DEGRADED alarm detection | ✅ | ✅ | ✅ |
| Executive PDF + JSON output | ✅ | ✅ | ✅ |
| `--demo` synthetic walkthrough | ✅ | ✅ | ✅ |
| **Create missing alarms in your account** | ❌ | ✅ | ✅ |
| Tagged + reversible alarm provenance | ❌ | ✅ | ✅ |
| SNS / PagerDuty / Opsgenie wiring | ❌ | ✅ | ✅ |
| Scheduled / continuous audits | ❌ | ✅ | ✅ |
| Slack-based incident triage | ❌ | ❌ | ✅ |
| Automated RCA + remediation suggestions | ❌ | ❌ | ✅ |
| Jira / Confluence incident lifecycle | ❌ | ❌ | ✅ |
| Multi-tenant managed SaaS | ❌ | ✅ | ✅ |
| Pricing | Free (MIT) | [opsfabric.ai](https://opsfabric.ai) | [opsfabric.ai](https://opsfabric.ai) |

See [`docs/comparison.md`](docs/comparison.md) for the full version with one-paragraph explanations of each row.

## About OpsFabric

We build reliability automation for AWS-heavy mid-market teams. DiscoveryFabric is open-source under the MIT license because the audit should be free — we make money on the remediation and incident-response automation. The OSS funnel and the SaaS funnel feed each other: a team runs the audit, sees their coverage is below baseline, and decides whether to fix the gaps themselves or have us do it.

- Website: [opsfabric.ai](https://opsfabric.ai)
- Commercial questions: <vaishal2611@gmail.com>
- AlarmFabric product: [opsfabric.ai](https://opsfabric.ai)
- OpsFabric incident response: [opsfabric.ai](https://opsfabric.ai)

## Contributing

PRs welcome — see [CONTRIBUTING.md](CONTRIBUTING.md) for dev setup, test conventions, and what's in scope (resource types, matching strategies, output polish) vs out of scope (live alarm creation, runtime incident handling — those belong to the commercial products).

By participating, you agree to our [Code of Conduct](CODE_OF_CONDUCT.md).

## Support

| Question type | Where to go |
|---|---|
| Bug in the OSS audit tool | [GitHub issues](https://github.com/OpsFabric/opsfabric-discovery-pkg/issues) — use the Bug Report template |
| Feature idea for the OSS audit | [GitHub issues](https://github.com/OpsFabric/opsfabric-discovery-pkg/issues) — use the Feature Request template |
| Security issue | Email <vaishal2611@gmail.com> ([SECURITY.md](SECURITY.md)) |
| Commercial AlarmFabric / OpsFabric questions | Email <vaishal2611@gmail.com> |

## License

[MIT](LICENSE) — © 2026 Vaishal Shah / OpsFabric.
