{{ risk.title }}
{{ risk.consequence }}
Including
-
{% for ex in risk.examples %}
- {{ ex.name }} {{ ex.kind }} {% endfor %} {% if risk.examples_more > 0 %}
- + {{ risk.examples_more }} more {% endif %}
Reliability audit
How to read this report
Every service this AWS account is running — applications, background jobs, databases, queues — and the alarms configured to detect their failures.
Where alarms are missing or broken. A missing alarm means a failure will go undetected until customers report it. A broken alarm means you've already paid for monitoring that isn't actually working.
See Where the business is exposed for the categories of damage, then Recommended next steps for the actions to take this week.
The risk
{{ threat_lead|safe }}
{{ threat_sub|safe }}
{% if remediation_callout %}{{ remediation_callout|safe }}
{% endif %}{{ first_audit_context|safe }}
{% endif %}{% if total_gaps_count > 0 %} {{ total_gaps_count }} technical gaps grouped into the categories of damage they create. Switch tabs to drill into the per-check engineering detail. {% else %} No required-check gaps detected — coverage is strong. {% endif %}
{{ risk.consequence }}
Including
Every gap, grouped first by category of damage, then by the service it
affects, then by the specific check. Use this view to assign work to
engineers — the same data lives in alarm-coverage-missing.json.
{{ cat.subtitle }}
{{ resource.urgency_reason }}
{% elif resource.usage_summary %}{{ resource.usage_summary }}
{% elif not resource.data_available and resource.no_data_note %}{{ resource.no_data_note }}
{% endif %}{{ check.business_impact }}
{% endif %}Each row is a category of your infrastructure. Coverage = the percentage of standard monitoring checks (service up/down, error rates, capacity) that have working alarms today. Gap = what's missing. Anything under 80% is a known incident class.
| Resource type | Resources | Required | Met | Coverage | Gap |
|---|---|---|---|---|---|
| {{ row.label }} | {{ row.resource_count }} | {{ row.required_total }} | {{ row.required_met }} | {{ row.coverage_pct }}% | {{ row.gap_pct }}% |
Industry baseline = the typical coverage we see across mid-market cloud engineering teams. Your account = the percentage of standard checks that have working alarms today. A 90+ score means most failures will be detected automatically; below 60 means most failures will be reported by customers first.
In priority order. The first action is the highest-leverage thing to do this week; the last is the engagement that closes the loop.
What happens next
DiscoveryFabric is the free audit you just read. AlarmFabric and OpsFabric are the paid multi-agent fabrics that act on what it found.
AlarmFabric · paid
OpsFabric · paid
Pilot pricing during the first customer cohort vaishal2611@gmail.com