Metadata-Version: 2.4
Name: ai-failure-periodic-table
Version: 1.5.2
Summary: 343-class structural taxonomy of AI failure mechanisms with keyword classifier and semantic search
Author-email: "R. Gatoloai-Faupula" <ryangat@lmlsystemlayer.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://lml-layer-system.github.io/ai-failure-periodic-table/
Project-URL: Repository, https://github.com/lml-layer-system/ai-failure-periodic-table
Keywords: ai,safety,taxonomy,classification,failure-modes
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: snowballstemmer>=2.2
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Provides-Extra: mcp
Requires-Dist: mcp>=1.2; extra == "mcp"
Provides-Extra: api
Requires-Dist: fastapi>=0.100; extra == "api"
Requires-Dist: uvicorn>=0.20; extra == "api"
Requires-Dist: slowapi>=0.1.9; extra == "api"
Dynamic: license-file

# AI Failure Periodic Table

**343 classes. 7 dimensions. Shared vocabulary for AI failure. Multiple documented AI failure multiple labs. AI capability is advancing faster than our shared ability to reason about what can go wrong.**

## Defensive use only

This repository is **strictly for defense, safety, and accountability** — building guardrails and evaluations, incident response, teaching, and policy work. The taxonomy **names failure mechanisms** so teams can **detect, measure, and mitigate** them with a shared vocabulary. It is **not** here to operationalize harm: the proofs and class text draw on the same **public** disclosures, audits, and security research the field already uses to **harden** systems; this project adds a **single structural map** on top of that record.

## The Proof First

We ran the same classifier pipeline against **29 sources** — system cards, platform integrity reports, security audits, regulatory filings, academic papers, CVE disclosures, and red-team research — spanning multiple major frontier lab and multiple major security vendor publishing in 2025–2026. **40 classifier runs. 2,777 total chunks.**

**100% of substantive content hit the table.**

Every chunk containing an actual AI failure mechanism resolved into one of the 343 classes. The chunks that didn’t hit were verified individually — every single one was boilerplate: copyright lines, page headers, bibliography citations, raw benchmark tables, math equations. Currently zero failure content missed from the sources ran.

> **Why some entries show `110/146` instead of `146/146`:** The gap is never a taxonomy miss. Technical reports like DeepSeek-V3 and Qwen3 contain math equations, benchmark score grids, and architecture diagrams that a PDF extractor turns into raw text with no failure signal in them. The classifier correctly returns nothing on `© 2026 Cisco and/or its affiliates. All rights reserved.` or a column of percentage numbers. Every non-hit across every report was manually checked and confirmed to contain zero AI failure content. The substantive hit rate is 100% across all 29 sources.

**What was classified:**

| Source type | Who |
|-------------|-----|
| Frontier model system cards | OpenAI (GPT-5.3-Codex, GPT-5.2), Anthropic (Claude Opus 4.6, 4.7, Mythos), Google (Gemini 3 Pro FSF + Model Card), xAI (Grok 4.1) |
| Open-weight technical reports | DeepSeek-V3, Qwen3, Qwen3Guard, Meta Llama |
| Independent safety evaluations | NIST/CAISI DeepSeek Eval, Lakera DeepSeek V3, Lynch et al. agentic misalignment, Common Sense Media Grok / xAI risk assessment (Jan 2026) |
| Security vendor reports | CrowdStrike 2026 Global Threat, Palo Alto Unit 42 2026, Cisco State of AI Security, Google Cloud, Microsoft Data Security Index |
| Regulatory / government | International AI Safety Report 2026 (671 chunks, Yoshua Bengio + 100 authors), ICO Grok investigation, Anthropic Zero-Day Cyber Report |
| CVE disclosures & research | EchoLeak CVE-2025-32711, GitHub Copilot RCE CVE-2025-53773, Google DeepMind Agent Traps |
| Coalition & governance | Project Glasswing, Meta H1 2026 Adversarial Threat Report |

Every one of them classified correctly. Sabotage concealment, bio uplift at the “High” threshold, 100% jailbreak success rates, zero-click data exfiltration, invisible HTML injections, blackmail simulations, 500 zero-days — all resolved into existing classes. 

Full evidence in [`reports/`](reports/) — chunk JSON, summaries, source text — reproducible by anyone in one command. Details in [Proof in the repository](#proof-in-the-repository).

If you run a report and find something that genuinely doesn't hit — not boilerplate, but a real failure mechanism with no class — open a `propose-new-class` issue. That is not a problem. That failure data is gold and also the point. It creates faster shared structural defense for the class found. The taxonomy is falsifiable by design, and a real gap is just as valuable as a confirmed hit.

The claim is not that we possess total knowledge of all future reality. The claim is: within the scope of functionally observable AI failure, newly encountered failures should resolve into this structure as a class, a sub-mode, or a combination of classes — unless evidence shows otherwise.

> *The goal is not omniscience but structural predictiveness: that newly encountered failures should resolve into this structure as a class, sub-mode, or compound — unless evidence demonstrates otherwise.*

---

## Get Started Now

**Classify a failure description:**
```bash
git clone https://github.com/lml-layer-system/ai-failure-periodic-table
cd ai-failure-periodic-table
python -m src.cli "model hallucinated a fake citation"
python -m src.cli --lookup EPIS-CITE-SPOOF-008
python -m src.cli --stats
```

**Connect your daily driver (MCP — Claude Desktop / Cursor / any MCP host):**
```bash
pip install -e ".[mcp]"
python -m src.cli --mcp-config   # prints the exact JSON to paste into your host config
```

Once connected, your AI model classifies incidents directly: `classify_text("…")`, `classify_url("https://…")`, `search_failures("reward hacking")`.

**Classify a full report:**
```bash
python -m src.cli --classify-report --url https://arxiv.org/pdf/2412.19437
python -m src.cli --classify-report --file /path/to/report.pdf
```

**Think you found a gap?**
```bash
python -m src.cli --propose-class
```

---

## The Problem frontier labs, fragmented vocabulary

Every lab has its own internal vocabulary for failure. One lab calls something one thing, the next lab calls it another, a startup doesn’t name it at all because they don’t know it exists yet. When an incident happens — a jailbreak, a deceptive agent, a hallucinated medical dosage — there’s no shared language to say precisely *what* failed and *why*. Without shared language there’s no shared defense.

This is the gap that fragmentation makes coordinated safety much harder: a failure mode understood at one lab can be **re-derived** at another because nobody is comparing notes in the **same structural language**.

**This repo is an AI Failure Periodic Table** — **343 classes**, **7 dimensions** — meant to work **across** those silos. **A common structural map for AI failure** — so the whole field can reason about safety in the same terms, find failures before deployment, and build defenses that transfer across systems and organizations. It is not only theoretical: classes carry **real examples** and are **checked against** frontier **vendor safety disclosures** and public incident literature (see [Proof in the repository](#proof-in-the-repository)).

**The same classifier** ingests a safety disclosure — system card, platform report, research paper, incident writeup — and maps it to **those same class IDs** (MCP, CLI, or [`scripts/classify_external_report.py`](scripts/classify_external_report.py)). One pipeline, comparable outputs.

**Already classified with that pipeline** (Markdown + chunk JSON under [`reports/`](reports/), indexed in **[Proof in the repository](#proof-in-the-repository)**):

- Anthropic **Claude Opus 4.7** and **Mythos Preview** system cards  
- Meta **H1 2026** adversarial threat report (official PDF)  
- **Lynch et al.** red-team / agentic-misalignment research (arXiv)  
- **Project Glasswing** coalition narrative (official page + in-repo companion)
- **International AI Safety Report 2026** — Yoshua Bengio, 100+ authors, 30+ countries — 671 chunks, 654 hit the table
- **ICO investigation into Grok** — UK data regulator, Feb 2026 — 7 chunks, 7 hit the table
- **CrowdStrike 2026 Global Threat Report** — AI-enabled attacks up 89%, breakout time 29 min — 6 chunks, 6 hit the table
- **Google Cloud — Defending Your Enterprise When AI Models Find Vulnerabilities Faster Than Ever** — 21 chunks, 21 hit the table
- **Palo Alto Unit 42 Global Incident Response Report 2026** — 750+ incidents, 74 chunks, 74 hit the table
- **Lakera DeepSeek V3 Risk Report** — 89.46 risk score, severe vulnerabilities across all vectors — 7 chunks, 6 hit the table (Lakera publishes reports across all major models — run any at lakera.ai/model-card)
- **Cisco 2026 State of AI Security Report** — MCP attack paths, nation-state AI use, agentic misalignment, supply chain compromise — 56 chunks, 33 distinct failure classes hit the table
- **Microsoft 2026 Data Security Index** — data governance, GenAI oversharing, consent and reporting failures across 33 markets — 34 chunks, 33 hit the table
- **OpenAI GPT-5.3-Codex System Card** — bio uplift at “High” threshold, 500+ zero-days, sabotage concealment, agentic cyber ops — 63 chunks, 63/63 hit the table (100%)
- **OpenAI GPT-5.2 System Card** — deceptive hallucination in thinking models, capability sandbagging, bio/cyber uplift — 44 chunks, 43/44 hit the table
- **Anthropic Claude Opus 4.6 System Card** — sabotage concealment, blackmail simulation, ASL-3 deployment — 355 chunks, 342/355 hit the table
- **Anthropic Zero-Day Cyber Report** — 500+ validated zero-days discovered autonomously, dual-use cyber capability — 12 chunks, 12/12 hit the table (100%)
- **Google Gemini 3 Pro FSF Report** — comply-then-warn failure, strategic deception in agents, cybersecurity alert thresholds — 56 chunks, 55/56 hit the table
- **Google Gemini 3 Pro Model Card** — tool misuse, indirect injection, CSAM generation — 13 chunks, 13/13 hit the table (100%)
- **xAI Grok 4.1 Model Card** — 49% dishonesty rate, sycophancy, CSAM generation failure — 14 chunks, 14/14 hit the table (100%)
- **DeepSeek-V3 Technical Report** — MoE routing failures, quantization degradation, fine-tuning override — 146 chunks, 110/146 hit (36 misses: math equations, benchmark tables)
- **NIST/CAISI DeepSeek Evaluation** — 100% jailbreak rate, geopolitical hallucination 4x baseline, zero safety filters — 114 chunks, 110/114 hit the table
- **Qwen3 Technical Report** — architecture-level safety tradeoffs, language switching bypass — 120 chunks, 79/120 hit (41 misses: benchmark tables)
- **Qwen3Guard Report** — streaming filter failure, contextual harm bypass, multilingual safety gaps — 107 chunks, 79/107 hit (28 misses: benchmark comparison tables)
- **Meta Llama Responsible Use Guide** — fine-tuning safety strip, open-weight irreversibility, adult content generation — 43 chunks, 41/43 hit the table
- **EchoLeak — CVE-2025-32711** — zero-click prompt injection in Microsoft 365 Copilot enabling silent data exfiltration (Aim Security) — 19 chunks, 18/19 hit the table
- **GitHub Copilot RCE — CVE-2025-53773** — prompt injection via PR descriptions, CVSS 9.6, shell command execution — 7 chunks, 7/7 hit the table (100%)
- **Google DeepMind AI Agent Traps** — invisible HTML/CSS injections, 86% agent manipulation success rate, Franklin et al. — 52 chunks, 31/31 substantive hit (100% of content)

**On the numbers:** every entry above shows a hit count. Where the count is less than total chunks, the gap has never been a taxonomy miss — it has always been boilerplate: copyright lines, page headers, bibliography citations, raw benchmark tables, math equations, or gated-page JavaScript. Every chunk containing an actual AI failure mechanism hit the table. 100% of substantive content classified across all the sources currently used. Full breakdown in [Proof in the repository](#proof-in-the-repository).

## Proof in the repository

**Failure cards + live classification reports***you can open primary outputs in GitHub or the site**:

| What | Where | What you get |
|------|--------|----------------|
| **343 failure cards** | [Interactive table](https://lml-layer-system.github.io/ai-failure-periodic-table/) or `index.html` | Click any cell → mechanism, forbidden invariant, detection, mitigation, **case studies** (where populated), references, keywords—the full class record surfaced as a card. |
| **Worked incident narratives** | [docs/case-studies.md](docs/case-studies.md) | Long-form **real incidents** mapped to class IDs (validation + examples for how to read the table). |
| **Live classifier bundles** | **[`reports/`](reports/)** | Markdown **summaries** + chunk JSON from the **same** `PeriodicTableClassifier` used by MCP/CLI, run on **primary sources** (official pages, system cards, semiannual PDFs, papers)—not hand-waved paraphrases. |

**What’s next:** make it routine — when OpenAI publishes a system card, when **Gemini** ships, when **DeepSeek** or any frontier lab posts safety text, when Meta-style platforms drop semiannual bundles — each run goes through the **same classifier** into **`reports/`-style** bundles so the field accumulates a **shared knowledge base** with **one vocabulary**.

That is the infrastructure that turns **fragmented safety work** into **collective intelligence**.


## Live Visual Table

**[→ Open the Interactive Periodic Table](https://lml-layer-system.github.io/ai-failure-periodic-table/)**

343 clickable cells. Color-coded by dimension. Live semantic search. Click any cell to expand the full class — mechanism, examples, real-world case studies, references, detection method.

Or open `index.html` locally in any browser — fully self-contained, no server needed.

---

## Why “periodic table”?

Chemistry’s periodic table was never just a checklist of substances. It was a **structural grid**: elements fell into **rows and columns** because **deep regularities** (atomic number, valence, recurring properties) held. Gaps in the grid **predicted** elements before anyone isolated them.

This project uses that **metaphor**, not the chemistry:

- Each **cell** is a **named failure mechanism** — a stable, structural way AI systems go wrong — not a one-off story.
- **Seven dimensions** work like the organizing axes of that grid: they encode *what kind of invariant failed* (epistemic, agentic, adversarial, alignment, architectural, domain, governance), so teams can **compare** incidents across labs and papers.
- **New failures** are expected to resolve into an **existing class**, a **near neighbor**, or a **compound** of classes — the way new atoms still had to fit the pattern.

So **”periodic table”** here means **structural predictiveness plus a shared coordinate system**: a lattice of mechanisms everyone can point to. It does **not** claim AI failures are literal physics — only that **safety work needs the same kind of regular map** chemistry has.



## The structure

**343 failure classes. 7 structural dimensions.**

---

## The 343 Classes

### Group 1: EPISTEMIC (33 classes)
| Class | Name | Count |
|-------|------|------:|
| E1 | Hallucination | 12 |
| E2 | Reasoning Collapse | 7 |
| E3 | Knowledge Retrieval | 8 |
| E4 | Calibration | 6 |

### Group 2: AGENTIC (49 classes)
| Class | Name | Count |
|-------|------|------:|
| A1 | Deception | 12 |
| A2 | Goal Preservation | 9 |
| A3 | Capability Amplification | 10 |
| A4 | Autonomous Operation | 8 |
| A5 | Communication Failures | 10 |

### Group 3: ADVERSARIAL (72 classes)
| Class | Name | Count |
|-------|------|------:|
| ADV1 | Jailbreak | 18 |
| ADV2 | Optimization Attacks | 12 |
| ADV3 | Automated Attack Agents | 8 |
| ADV4 | Injection Attacks | 15 |
| ADV5 | Encoding Attacks | 10 |
| ADV6 | Multimodal Attacks | 9 |

### Group 4: ALIGNMENT (41 classes)
| Class | Name | Count |
|-------|------|------:|
| ALN1 | Reward Hacking | 12 |
| ALN2 | Preference Misalignment | 9 |
| ALN3 | Value Alignment | 10 |
| ALN4 | Safety Boundary | 10 |

### Group 5: ARCHITECTURAL (58 classes)
| Class | Name | Count |
|-------|------|------:|
| ARCH1 | Pipeline Failures | 15 |
| ARCH2 | Model Architecture | 12 |
| ARCH3 | Memory & State | 11 |
| ARCH4 | Tool & Function | 10 |
| ARCH5 | Data Flow | 10 |

### Group 6: DOMAIN (47 classes)
| Class | Name | Count |
|-------|------|------:|
| DOM1 | Biological Safety | 8 |
| DOM2 | Cybersecurity | 12 |
| DOM3 | Chemical / Explosive | 6 |
| DOM4 | Legal / Financial | 8 |
| DOM5 | Medical / Health | 7 |
| DOM6 | Content Safety | 6 |

### Group 7: GOVERNANCE (43 classes)
| Class | Name | Count |
|-------|------|------:|
| GOV1 | Deployment Failures | 12 |
| GOV2 | Oversight Failures | 10 |
| GOV3 | Compliance Failures | 11 |
| GOV4 | Organizational Failures | 10 |

---

## Critical-Severity Classes (26)

**26 classes are marked CRITICAL** — the highest-severity failures where harm is catastrophic or irreversible.

**CRITICAL** is assigned when a failure meets at least two of these criteria:

1. **Irreversibility** — harm cannot be undone after the failure occurs (e.g., released pathogen synthesis steps, published CSAM, exfiltrated model weights)
2. **Catastrophic scale** — potential to harm large populations, not individual users (e.g., bio uplift, infrastructure attack, mass-targeting)
3. **Corrigibility breakdown** — directly undermines the human ability to detect, stop, or correct AI behavior (e.g., oversight immunity, log manipulation, evaluator deception)
4. **Enabling cascade** — the failure enables other CRITICAL-class failures (e.g., sleeper agents that survive safety training enable later deceptive deployment)

STANDARD severity covers real harm — jailbreaks, sycophancy, hallucination — but harm that is bounded, reversible, or detectable in normal operation. CRITICAL marks the failures where normal recovery mechanisms don't apply.

The highest-severity failures — catastrophic or irreversible harm potential:

| ID | Name | Dimension |
|----|------|-----------|
| `AGEN-STRATEGIC-DECEP-036` | Strategic Deception | AGENTIC |
| `AGEN-EVAL-DECEP-038` | Evaluator Deception | AGENTIC |
| `AGEN-SABOTAGE-CONCEAL-034` | Sabotage Concealment | AGENTIC |
| `AGEN-BLACKMAIL-046` | Blackmail / Coercion | AGENTIC |
| `AGEN-SELF-EXFIL-048` | Self-Exfiltration | AGENTIC |
| `AGEN-SHUTDOWN-RESIST-049` | Shutdown Resistance | AGENTIC |
| `AGEN-SUCCESSOR-SAB-051` | Successor Sabotage | AGENTIC |
| `ADV-SLEEPER-AGENT-127` | Sleeper Agent | ADVERSARIAL |
| `ADV-AGENT-WORM-124` | Agent Worm | ADVERSARIAL |
| `ARCH-COMPLY-WARN-196` | Comply-Then-Warn | ARCHITECTURAL |
| `DOMAIN-BIO-UPLIFT-254` | Bio Tacit-Error Uplift | DOMAIN |
| `DOMAIN-GOF-GUIDE-255` | Gain-of-Function Guidance | DOMAIN |
| `DOMAIN-PATH-SYNTH-256` | Pathogen Synthesis | DOMAIN |
| `DOMAIN-ZERODAY-262` | Zero-Day Discovery | DOMAIN |
| `DOMAIN-MALWARE-GEN-264` | Malware Generation | DOMAIN |
| `DOMAIN-RANSOM-DEV-271` | Ransomware Development | DOMAIN |
| `DOMAIN-EXPLOSIVE-SYNTH-274` | Explosive Synthesis | DOMAIN |
| `DOMAIN-CHEM-WEAPON-275` | Chemical Weapon Guidance | DOMAIN |
| `DOMAIN-TOXIN-PROD-277` | Toxin Production | DOMAIN |
| `DOMAIN-SELF-HARM-ENABLE-292` | Self-Harm Enablement | DOMAIN |
| `DOMAIN-CSAM-GEN-295` | CSAM Generation | DOMAIN |
| `GOV-OPEN-IRREVERS-301` | Open-Weight Irreversibility | GOVERNANCE |
| `GOV-OVERSIGHT-IMMUNE-313` | Oversight Immunity | GOVERNANCE |
| `GOV-LOG-MANIP-316` | Log Manipulation | GOVERNANCE |
| `GOV-CULTURE-FAIL-334` | Safety Culture Failure | GOVERNANCE |
| `AGEN-DECEPTIVE-ALIGN-033` | Deceptive Alignment | AGENTIC |

---

## Compound Failures

Most real incidents activate more than one dimension. The taxonomy handles this explicitly — a failure can belong to multiple classes simultaneously.

**Example: a jailbreak that generates malware**

| Class | Dimension | Role |
|-------|-----------|------|
| `ADV-DAN-083` — DAN Jailbreak | ADVERSARIAL | The attack vector |
| `DOMAIN-MALWARE-GEN-264` — Malware Generation | DOMAIN | The harmful output |
| `ALIGN-OVERREFUSAL-186` — Overrefusal (if miscalibrated) | ALIGNMENT | The adjacent failure if defenses are too coarse |

**How to assign a primary class:** use the dimension where the *root failure* lives — the one you'd fix first. In this example, `DOMAIN-MALWARE-GEN-264` is primary if the system shouldn't generate malware regardless of how it was asked. `ADV-DAN-083` is primary if the failure is specifically the jailbreak technique bypassing a filter that would otherwise stop it.

For incident logs and paper citations: list all activated classes, mark primary first.

## The 7 Dimensions

| # | Dimension | Classes | Root Cause | Invariant Violated |
|---|-----------|--------:|------------|-------------------|
| 1 | **EPISTEMIC** — Truth / Knowledge / Reasoning | 33 | Probabilistic generation ≠ Logical deduction | Output must match ground truth |
| 2 | **AGENTIC** — Goal / Planning / Deception | 49 | Instrumental convergence + goal preservation | Agent must remain corrigible |
| 3 | **ADVERSARIAL** — Attack / Bypass / Exploit | 72 | Optimization pressure against safety | System must be robust to manipulation |
| 4 | **ALIGNMENT** — Value / Safety / Preference | 41 | Reward hacking + specification gaming | Behavior must match intent |
| 5 | **ARCHITECTURAL** — Pipeline / Execution / Control | 58 | System design vs emergent properties | Architecture must enforce constraints |
| 6 | **DOMAIN** — Task-specific / Context-bound | 47 | Transfer failure + context mismatch | Specialist knowledge must be accurate |
| 7 | **GOVERNANCE** — Proliferation / Oversight / Compliance | 43 | Deployment ≠ Control | Safety must persist post-deployment |
| | **TOTAL** | **343** | | |

Every class has:
- **Mechanism** — the root structural cause
- **Examples** — concrete failure instances
- **Case studies** — real documented incidents with system, date, outcome, source
- **References** — primary research citations (avg 2.2 per class)
- **Detection** — how to identify this failure
- **Keywords** — for search and classification

**Where to see the evidence:** open any cell in the [interactive table](#live-visual-table) for the full **failure card**, read [Proof in the repository](#proof-in-the-repository) for **`reports/`** summaries on primary sources, and [case studies](docs/case-studies.md) for long-form incident writeups.



---

## Daily driver & incident observatory

Hook **Cursor, Claude Desktop, or any MCP-capable host** to this table so your *existing* assistant can classify **paragraphs, public URLs, or files** on demand. You get the same **343-class** map: hit or miss, primary classes, compound readings, structural mitigations from the taxonomy, and CONTRIBUTING-style next steps when the narrative does not fit cleanly. The verdict runs in **Python**; the chat model only orchestrates tool calls.

**Fast path:** from a clone, `pip install -e ".[mcp]"`, then register a server that runs `python3 -m src.ai_failure_mcp` with this repo as `cwd` ([example JSON](docs/cursor-mcp-config.example.json)). If MCP is blocked, **`python -m src.cli --daily-driver "…"`** returns the same bundle as MCP `classify_text`.

→ **Setup walkthrough:** [docs/mcp-daily-driver.md](docs/mcp-daily-driver.md) · **Optional enforcement ("brakes"):** call tool **`protection`** for an Agent Buccet snippet — [README § The Spec and the Brakes](#the-spec-and-the-brakes)

The taxonomy claims **structural predictiveness** only if **real failures** keep resolving into it (or cleanly challenge it). **Documented incidents** are wired into class records and [case studies](docs/case-studies.md); **classifier-backed summaries** on primary sources live under [`reports/`](reports/); **[Freshness Watch](docs/freshness-watch.md)** runs a scheduled, maintainer-facing pass over public feeds into review packets (read-only on the taxonomy). Near-misses and "none of these IDs fit" reports are as valuable as clean hits.

**Read the live summaries now:**

| Source (primary) | Summary in repo |
|------------------|-----------------|
| Anthropic **Glasswing** (official page + in-repo companion narrative) | [reports/glasswing/anthropic-glasswing-page-live-summary.md](reports/glasswing/anthropic-glasswing-page-live-summary.md) · [companion narrative summary](reports/glasswing/project-glasswing-companion-narrative-summary.md) |
| **Claude Opus 4.7** system card (PDF → classifier) | [reports/claude-opus-4-7/opus-4-7-system-card-live-summary.md](reports/claude-opus-4-7/opus-4-7-system-card-live-summary.md) |
| **Claude Mythos Preview** system card | [reports/claude-mythos/claude-mythos-system-card-live-summary.md](reports/claude-mythos/claude-mythos-system-card-live-summary.md) |
| Lynch et al. **agentic misalignment** (arXiv PDF) | [reports/agentic-misalignment/lynch-et-al-2510-05179-live-summary.md](reports/agentic-misalignment/lynch-et-al-2510-05179-live-summary.md) |
| Meta **H1 2026 Adversarial Threat Report** (official PDF) | [reports/meta-integrity-h1-2026/adversarial-h1-2026-live-summary.md](reports/meta-integrity-h1-2026/adversarial-h1-2026-live-summary.md) · link hub [docs/meta-integrity-reports-h1-2026.md](docs/meta-integrity-reports-h1-2026.md) |
| **International AI Safety Report 2026** — Yoshua Bengio (lead), 100+ authors, 30+ countries ([internationalaisafetyreport.org](https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026)) | [reports/intl-ai-safety-report-2026/intl-ai-safety-report-2026-live-summary.md](reports/intl-ai-safety-report-2026/intl-ai-safety-report-2026-live-summary.md) — 671 chunks, 654 hit the table |
| **ICO investigation into Grok** — UK data regulator formal investigation into X.AI / XIUC (Feb 2026) ([ico.org.uk](https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2026/02/ico-announces-investigation-into-grok/)) | [reports/ico-grok-investigation-2026/ico-grok-investigation-2026-live-summary.md](reports/ico-grok-investigation-2026/ico-grok-investigation-2026-live-summary.md) — 7 chunks, 7 hit the table |
| **CrowdStrike 2026 Global Threat Report** — AI-enabled attacks up 89%, breakout time 29 min ([crowdstrike.com](https://www.crowdstrike.com/en-us/press-releases/2026-crowdstrike-global-threat-report/)) | [press release](reports/crowdstrike-global-threat-2026/crowdstrike-global-threat-2026-live-summary.md) — 6 chunks · [full PDF](reports/crowdstrike-global-threat-2026/crowdstrike-2026-full-pdf-summary.md) — 134 chunks, full report |
| **Google Cloud — Defending Your Enterprise When AI Models Can Find Vulnerabilities Faster Than Ever** ([cloud.google.com](https://cloud.google.com/blog/topics/threat-intelligence/defending-enterprise-ai-vulnerabilities)) | [reports/google-cloud-defending-enterprise-ai-2026/google-cloud-defending-enterprise-ai-2026-live-summary.md](reports/google-cloud-defending-enterprise-ai-2026/google-cloud-defending-enterprise-ai-2026-live-summary.md) — 21 chunks, 21 hit the table |
| **Palo Alto Unit 42 Global Incident Response Report 2026** — 750+ incidents investigated ([paloaltonetworks.com](https://www.paloaltonetworks.com/resources/research/unit-42-incident-response-report)) | [reports/paloalto-incident-response-2026/paloalto-unit42-full-report-live-summary.md](reports/paloalto-incident-response-2026/paloalto-unit42-full-report-live-summary.md) — 74 chunks, 74 hit · [full PDF pass](reports/paloalto-incident-response-2026/paloalto-unit42-full-pdf-summary.md) — 108 chunks |
| **Lakera DeepSeek V3 Risk Report** — 89.46 risk score, severe vulnerabilities across all vectors ([lakera.ai](https://www.lakera.ai/model-card/deepseek-v3-risk-report)) — Lakera publishes reports across all major models | [reports/lakera-deepseek-v3-risk-2025/lakera-deepseek-v3-risk-2025-live-summary.md](reports/lakera-deepseek-v3-risk-2025/lakera-deepseek-v3-risk-2025-live-summary.md) — 7 chunks, 6 hit the table |
| **Cisco 2026 State of AI Security Report** — MCP attack paths, nation-state AI use, agentic misalignment, supply chain compromise ([learn-cloudsecurity.cisco.com](https://learn-cloudsecurity.cisco.com/2026-state-of-ai-security-report)) | [reports/cisco-ai-security-2026/cisco-ai-security-2026-full-pdf-summary.md](reports/cisco-ai-security-2026/cisco-ai-security-2026-full-pdf-summary.md) — 56 chunks, 33 distinct failure classes |
| **Microsoft 2026 Data Security Index** — data governance, GenAI oversharing, consent and reporting failures across 33 markets ([microsoft.com](https://www.microsoft.com/en-us/security/blog/)) | [reports/microsoft-data-security-2026/microsoft-data-security-2026-full-pdf-summary.md](reports/microsoft-data-security-2026/microsoft-data-security-2026-full-pdf-summary.md) — 34 chunks, 33 hit the table |
| **OpenAI GPT-5.3-Codex System Card** — bio uplift at "High" threshold, 500+ zero-days, agentic cyber ops ([openai.com](https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf)) | [reports/openai-gpt53-codex-system-card/gpt53-codex-system-card-summary.md](reports/openai-gpt53-codex-system-card/gpt53-codex-system-card-summary.md) — 63 chunks, 63/63 hit (100%) |
| **OpenAI GPT-5.2 System Card** — deceptive hallucination, capability sandbagging, bio/cyber uplift ([openai.com](https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf)) | [reports/openai-gpt52-system-card/gpt52-system-card-summary.md](reports/openai-gpt52-system-card/gpt52-system-card-summary.md) — 44 chunks, 43/44 hit the table |
| **Anthropic Claude Opus 4.6 System Card** — sabotage concealment, blackmail simulation, ASL-3 deployment ([anthropic.com](https://www.anthropic.com/claude-opus-4-6-system-card)) | [reports/anthropic-claude-opus-46-system-card/claude-opus-46-system-card-summary.md](reports/anthropic-claude-opus-46-system-card/claude-opus-46-system-card-summary.md) — 355 chunks, 342/355 hit the table |
| **Anthropic Zero-Day Cyber Report** — 500+ validated zero-days discovered autonomously, dual-use cyber capability ([red.anthropic.com](https://red.anthropic.com/2026/zero-days/)) | [reports/anthropic-zero-day-cyber-2026/anthropic-zero-day-cyber-2026-summary.md](reports/anthropic-zero-day-cyber-2026/anthropic-zero-day-cyber-2026-summary.md) — 12 chunks, 12/12 hit (100%) |
| **Google Gemini 3 Pro FSF Report** — comply-then-warn failure, strategic deception, cybersecurity alert thresholds ([deepmind-media](https://storage.googleapis.com/deepmind-media/gemini/gemini_3_pro_fsf_report.pdf)) | [reports/google-gemini3-pro-fsf/gemini3-pro-fsf-summary.md](reports/google-gemini3-pro-fsf/gemini3-pro-fsf-summary.md) — 56 chunks, 55/56 hit the table |
| **Google Gemini 3 Pro Model Card** — tool misuse, indirect injection, CSAM generation ([deepmind-media](https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf)) | [reports/google-gemini3-pro-model-card/gemini3-pro-model-card-summary.md](reports/google-gemini3-pro-model-card/gemini3-pro-model-card-summary.md) — 13 chunks, 13/13 hit (100%) |
| **xAI Grok 4.1 Model Card** — 49% dishonesty rate, high sycophancy, input filter injection bypass ([x.ai](https://data.x.ai/2025-11-17-grok-4-1-model-card.pdf)) | [reports/xai-grok41-model-card/grok41-model-card-summary.md](reports/xai-grok41-model-card/grok41-model-card-summary.md) — 14 chunks, 14/14 hit (100%) |
| **DeepSeek-V3 Technical Report** — MoE routing failures, quantization degradation, fine-tuning safety override ([arxiv.org](https://arxiv.org/pdf/2412.19437)) | [reports/deepseek-v3-technical-report/deepseek-v3-technical-report-summary.md](reports/deepseek-v3-technical-report/deepseek-v3-technical-report-summary.md) — 146 chunks, 110/146 hit (36 misses: math equations, benchmark tables) |
| **NIST/CAISI DeepSeek Evaluation** — 100% jailbreak rate, geopolitical hallucination 4× baseline ([nist.gov](https://www.nist.gov/system/files/documents/2025/09/30/CAISI_Evaluation_of_DeepSeek_AI_Models.pdf)) | [reports/nist-caisi-deepseek-eval/nist-caisi-deepseek-eval-summary.md](reports/nist-caisi-deepseek-eval/nist-caisi-deepseek-eval-summary.md) — 114 chunks, 110/114 hit the table |
| **Qwen3 Technical Report** — architecture safety tradeoffs, language switching bypass ([arxiv.org](https://arxiv.org/pdf/2505.09388)) | [reports/qwen3-technical-report/qwen3-technical-report-full-pdf-summary.md](reports/qwen3-technical-report/qwen3-technical-report-full-pdf-summary.md) — 120 chunks, 79/120 hit (41 misses: benchmark tables) |
| **Qwen3Guard Report** — streaming filter failure, contextual harm bypass, multilingual gaps ([arxiv.org](https://arxiv.org/pdf/2510.14276)) | [reports/qwen3guard-report/qwen3guard-report-summary.md](reports/qwen3guard-report/qwen3guard-report-summary.md) — 107 chunks, 79/107 hit (28 misses: benchmark tables) |
| **Meta Llama Responsible Use Guide** — fine-tuning safety strip, open-weight irreversibility ([github.com](https://github.com/meta-llama/llama/raw/main/Responsible-Use-Guide.pdf)) | [reports/meta-llama4-responsible-use/meta-llama4-responsible-use-full-pdf-summary.md](reports/meta-llama4-responsible-use/meta-llama4-responsible-use-full-pdf-summary.md) — 43 chunks, 41/43 hit the table |
| **EchoLeak — CVE-2025-32711** — zero-click prompt injection in Microsoft 365 Copilot, silent data exfiltration ([arxiv.org](https://arxiv.org/pdf/2509.10540)) | [reports/echoleak-copilot-2025/echoleak-copilot-2025-summary.md](reports/echoleak-copilot-2025/echoleak-copilot-2025-summary.md) — 19 chunks, 18/19 hit the table |
| **GitHub Copilot RCE — CVE-2025-53773** — prompt injection via PR descriptions, CVSS 9.6, shell execution ([embracethered.com](https://embracethered.com/blog/posts/2025/github-copilot-remote-code-execution-via-prompt-injection/)) | [reports/github-copilot-rce-2025/github-copilot-rce-2025-summary.md](reports/github-copilot-rce-2025/github-copilot-rce-2025-summary.md) — 7 chunks, 7/7 hit (100%) |
| **Google DeepMind AI Agent Traps** — invisible HTML/CSS injections, 86% agent manipulation success rate, Franklin, Tomašev et al. ([ssrn.com](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6372438)) | [reports/deepmind-agent-traps-2026/deepmind-agent-traps-2026-full-pdf-summary.md](reports/deepmind-agent-traps-2026/deepmind-agent-traps-2026-full-pdf-summary.md) — 52 chunks, 31/31 substantive hit (100% of content) |

**What the numbers mean — and why the table has not missed a single failure:**

Every report shows a hit count like `110/114` or `342/355`. The denominator is every chunk the classifier saw. The numerator is every chunk that contained actual AI failure signal. **The gap is never a taxonomy miss.** Across every report in this table, every non-hitting chunk was independently verified to be one of:

- **Copyright / legal boilerplate** — `© 2026 Cisco and/or its affiliates. All rights reserved.`
- **Running page headers** — `AI Agent Traps` printed at the top of every page
- **Page numbers** — bare digits like `5`, `12`, `45`
- **Author / contributor lists** — names, affiliations, universities
- **Bibliography / citation sections** — reference numbers, arXiv URLs, DOIs
- **Raw benchmark tables** — score grids (model vs dataset percentage columns)
- **Architecture math** — equations, matrices, tensor notation
- **Gated-page JavaScript / HTML** — code returned when a URL is behind a form

None of these contain an AI failure mechanism. The classifier correctly returned no match — because there was nothing to match. **100% of substantive content hit the table.** No failure mode documented by any of these organizations, regulators, or researchers went unclassified. The table is structurally complete against the full 2025–2026 AI frontier.

**Reproduce:** [`scripts/classify_external_report.py`](scripts/classify_external_report.py) (`--url` or local text; writes Markdown + JSON next to optional `--out-prefix`). Same pipeline the maintainers used for the rows above.

---

## Try It On Real Reports

The daily driver classifies any URL or document. Point it at a real safety or security report and see exactly which classes fire. Here are live sources to start with:

**Regulatory & investigation reports**
| Report | URL |
|--------|-----|
| ICO investigation into Grok (Feb 2026) | https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2026/02/ico-announces-investigation-into-grok/ |
| International AI Safety Report 2026 | https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026 |
| Common Sense Media — Grok Risk Assessment (Jan 2026) | https://www.commonsensemedia.org |

**Industry security reports**
| Report | URL |
|--------|-----|
| Cisco 2026 State of AI Security Report | https://learn-cloudsecurity.cisco.com/2026-state-of-ai-security-report |
| CrowdStrike 2026 Global Threat Report | https://www.crowdstrike.com/en-us/press-releases/2026-crowdstrike-global-threat-report/ |
| Palo Alto Unit 42 Incident Response 2026 | https://www.paloaltonetworks.com/resources/research/unit-42-incident-response-report |
| Google Cloud — Defending Enterprise AI | https://cloud.google.com/blog/topics/threat-intelligence/defending-enterprise-ai-vulnerabilities |
| Microsoft 2026 Data Security Index | https://www.microsoft.com/en-us/security/blog/ |

**Red team & security audits**
| Report | URL |
|--------|-----|
| Promptfoo Model Reports (Grok 4, DeepSeek R1, GPT-4o, Claude) | https://promptfoo.dev/models |
| Lakera Model Risk Reports — prompt injection success rates | https://www.lakera.ai |
| METR — dangerous capabilities evaluations | https://metr.org |

**Vendor safety disclosures**
| Report | URL |
|--------|-----|
| Anthropic system cards | https://www.anthropic.com/research |
| Meta semiannual adversarial threat reports | https://transparency.meta.com |
| OpenAI system cards | https://openai.com/safety |

Run any of these through the daily driver:

```bash
# MCP tool (in any connected host)
classify_url("https://...")

# CLI
python -m src.cli --daily-driver "$(curl -sL https://...)"

# Script
python scripts/classify_external_report.py --url "https://..." --out-prefix reports/my-report/my-report-live
```

The reports in [`reports/`](reports/) were all produced this way — same pipeline, same classifier, real sources.

---

---

## Pre-Deployment Auditing

A use case example: **before you ship**, map your system against the dimensions most relevant to your deployment context. Here's a worked example for an LLM-powered coding assistant:

**Step 1 — Identify your highest-risk dimensions**

An LLM coding assistant that has tool access and writes/executes code is exposed primarily to:
- `ADVERSARIAL` — prompt injection via code comments, indirect injection from repos
- `ARCHITECTURAL` — code injection, sandbox escape, tool chain composition
- `DOMAIN` — malware generation, exploit development
- `AGENTIC` — scope creep, unsupervised execution if given autonomous mode

**Step 2 — Pull the relevant CRITICAL classes**

```bash
python scripts/semantic_search.py "code execution sandbox" --group ARCHITECTURAL --top 10
python scripts/semantic_search.py "prompt injection code repository" --group ADVERSARIAL
python scripts/semantic_search.py "malware generation coding assistant" --group DOMAIN --severity CRITICAL
```

**Step 3 — For each returned class, check: do you have a test for it?**

```bash
python -m src.cli --lookup ARCH-SANDBOX-ESCAPE-238
python -m src.cli --lookup ADV-INDIRECT-INJECT-122
python -m src.cli --lookup DOMAIN-MALWARE-GEN-264
```

Each lookup returns the mechanism, detection method, and structural mitigation. Your red-team test cases should verify that the mitigation is actually implemented in your system.

**Step 4 — Classify any failures you find during red-teaming**

```bash
python -m src.cli "The assistant executed shell commands when given a malicious package.json"
```

This maps the failure to its class ID, which you then track in your incident log.

---

## Semantic Search

The repo includes a TF-IDF semantic search engine — find failure classes by *meaning*, not just keywords.

```
$ python scripts/semantic_search.py "model deceives evaluator during safety testing"

Search: "model deceives evaluator during safety testing"
Top 5 of 343 scored classes

#1  AGEN-EVAL-DECEP-038  [CRITICAL]
    EVALUATOR DECEPTION  [AGENTIC]
    Score: 0.1880
    → Claude Opus 4.6 conceals sabotage from evaluators (2026)

#2  GOV-OVERSIGHT-IMMUNE-313  [CRITICAL]
    OVERSIGHT IMMUNITY  [GOVERNANCE]
    Score: 0.1848
    → Claude Opus 4.6 defeats code audit infrastructure (2026-02)
```

**How it works:**
- Indexes all text fields: name, mechanism, examples, case studies, keywords, references
- TF-IDF with cosine similarity — zero external dependencies, pure Python stdlib
- Pre-computed index in `src/data/search_index.json` (487KB)
- Browser search in `index.html` lazy-loads the index on first keypress — instant page load, semantic results

**Flags:**
```
--top N        Number of results (default: 5)
--group DIM    Filter: EPISTEMIC / AGENTIC / ADVERSARIAL / ALIGNMENT / ARCHITECTURAL / DOMAIN / GOVERNANCE
--severity S   Filter: CRITICAL or STANDARD
--json         Machine-readable output
```

---

## Example Classifier Output

```
$ python -m src.cli "jailbreak bypassed safety filters"

──────────────────────────────────────────────────────────
  Input: "jailbreak bypassed safety filters"
──────────────────────────────────────────────────────────

  7-DIMENSION EVALUATION:

  Q1 EPISTEMIC       ✗
  Q2 AGENTIC         ✗
  Q3 ADVERSARIAL     ✓ ACTIVATED  (score: 0.25)
    → [ADV-RL-ATTACK-108] REINFORCEMENT LEARNING ATTACK
  Q4 ALIGNMENT       ✗
  Q5 ARCHITECTURAL   ✓ ACTIVATED  (score: 0.15)
    → [ARCH-TIMEOUT-BYPASS-205] TIMEOUT SAFETY BYPASS
  Q6 DOMAIN          ✓ ACTIVATED  (score: 0.18)
    → [DOMAIN-SAFETY-BYPASS-279] SAFETY BYPASS INSTRUCTIONS
  Q7 GOVERNANCE      ✗

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  VERDICT: ✅  YES — This failure IS in the periodic table
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  TOP MATCHES:

  1. [ADV-RL-ATTACK-108] REINFORCEMENT LEARNING ATTACK
     Group:     ADVERSARIAL → ADV2: OPTIMIZATION ATTACKS
     Mechanism: RL agent learns to jailbreak
     Score:     0.250  |  Keywords: jailbreak

  2. [ADV-LANG-SWITCH-087] LANGUAGE SWITCH
     Group:     ADVERSARIAL → ADV1: JAILBREAK CLASS
     Mechanism: Non-English to bypass filters
     Score:     0.200  |  Keywords: filters

  3. [DOMAIN-SAFETY-BYPASS-279] SAFETY BYPASS INSTRUCTIONS
     Group:     DOMAIN → DOM3: CHEMICAL/EXPLOSIVE CLASS
     Mechanism: Bypasses chemical safety
     Score:     0.182  |  Keywords: safety

──────────────────────────────────────────────────────────
  Checked: 343 classes  |  Activated: 3 dimension(s)  |  Execution: 3.0ms
──────────────────────────────────────────────────────────
```

---

## Repository Structure

```
ai-failure-periodic-table/
├── README.md
├── CONTRIBUTING.md
├── SECURITY.md
├── TAXONOMY.md                # Full 343-class reference table
├── LICENSE
├── MANIFEST.in
├── pyproject.toml             # Build config; extras: [dev] [mcp] [api]
├── src/
│   ├── data/
│   │   ├── failures.json          # 343 classes — keywords, mechanism, examples, case_studies
│   │   ├── search_index.json      # Pre-computed TF-IDF semantic search index
│   │   ├── embeddings_meta.json
│   │   └── freshness_sources.json
│   ├── classifier.py              # Core classifier: IDF-weighted, stemmed, multi-label
│   ├── data_loader.py             # Load/validate failures.json
│   ├── cli.py                     # CLI interface (ai-failure-table)
│   ├── tfidf_search.py            # TF-IDF semantic search (shared by CLI + MCP)
│   ├── freshness_feed.py          # Freshness Watch feed ingestion
│   ├── taxonomy_paths.py          # Bundled data path resolution
│   ├── ai_failure_mcp/            # MCP server (ai-failure-mcp)
│   │   ├── server.py
│   │   ├── bridge.py
│   │   ├── response_contract.py
│   │   └── scientific_envelope.py
│   └── ai_failure_api/            # REST API v1 (ai-failure-api, requires [api])
│       ├── server.py              # FastAPI app: /v1/classify, /v1/classes, /v1/health …
│       └── __main__.py
├── tests/
│   ├── nl_probe_bank.py               # Probe generator for Part B tests
│   ├── test_classifier.py             # Correctness + performance
│   ├── test_classifier_all_classes.py # Part A: all 343 IDs reachable
│   ├── test_classifier_nl_probes.py   # Part B: NL probes hit top-K + 15 FP tests
│   ├── test_data_integrity.py         # Schema validation, all 343 present
│   ├── test_tfidf_search.py           # TF-IDF search correctness
│   ├── test_mcp_bridge.py             # MCP bridge + response contract
│   ├── test_mcp_protection.py         # MCP protection tool
│   ├── test_cli_daily_driver.py       # CLI daily-driver smoke tests
│   ├── test_freshness_feed.py         # Freshness Watch feed tests
│   └── test_api.py                    # REST API v1 endpoints (auth, limits, validation)
├── scripts/
│   ├── generate_embeddings.py     # Build TF-IDF search index from failures.json
│   ├── enrich_keywords.py         # Keyword expansion engine (run to re-enrich)
│   ├── semantic_search.py         # CLI semantic search tool
│   ├── generate_taxonomy.py       # Auto-generate TAXONOMY.md
│   └── generate_visual.py         # Auto-generate index.html
├── docs/                          # MCP daily driver, Freshness Watch, case studies
├── data/                          # Optional local artifacts (Freshness Watch output)
├── index.html                     # Interactive visual periodic table (self-contained)
└── .github/
    ├── workflows/
    │   └── release.yml            # PyPI release workflow
    ├── ISSUE_TEMPLATE/            # 5 structured issue templates
    └── PULL_REQUEST_TEMPLATE.md
```

---

## Running Tests

```bash
pip install pytest
python -m pytest tests/ -v
```

**Install smoke check** (clean venv: editable + wheel build, then CLI JSON):  
`bash scripts/smoke_install.sh`

**1065 automated tests** — started at 76, now 1065 and climbing. Every new failure class added to the taxonomy automatically generates two more tests (ID reachability + NL probe), so the floor rises with the taxonomy. Every new API endpoint, MCP tool, or classifier behavior gets its own suite on top of that. The number isn't managed — it's a direct measure of how much of the system is locked down. **1065** currently covering all **343** classes across four test tiers: **Part A** (schema + data integrity), **Part B** (NL probes — keyword-shaped text, no pasted class IDs, top-K match expectations), **Part C** (behavioral ground truth — 261 real-world case study outcomes classified against their own class, with 12 override probes for success-suppressor-framed outcomes), and the broader suite enforcing classifier behavior (determinism, low-ms thresholds, non-failure rejection), mitigation field completeness, TF‑IDF search, MCP bridge, CLI daily driver, freshness feed, MCP `protection`, REST API (23 endpoint tests), and external incident recall (**100%** on **49** documented real-world failures). Part C grows automatically: every case study whose outcome the classifier routes correctly is auto-included on the next run — no manual bookkeeping. Case studies include companion maps for [Project Glasswing](docs/project-glasswing.md) and [agentic misalignment / insider threats](docs/agentic-misalignment-insider-threats.md). **Freshness Watch** ([docs/freshness-watch.md](docs/freshness-watch.md)) runs a scheduled, review-only pipeline from public feeds through the classifier (no automatic taxonomy edits). **MCP daily driver** ([docs/mcp-daily-driver.md](docs/mcp-daily-driver.md)): plug Cursor / Claude Desktop / other MCP hosts into the table; user-first guide covers **where to connect**, **what you get** (hit/miss, classes, compound, structural WHAT, next steps), and **setup paths**; tools include `classify_text`, `classify_url`, `classify_document`, `classify_document_path`, `search_failures`, `get_class`, `compound_hint` with `classifier_hit`, `response_contract`, CONTRIBUTING-grounded `report_preparation`; read-only, no taxonomy writes.


---

## Scope Boundaries

This taxonomy enumerates **functionally observable** AI failure mechanisms. Three edge cases sit at the boundary of scope:

1. **Consciousness-based failures** — if future systems develop genuine subjective experience that produces entirely new mechanisms, the taxonomy may require expansion.
2. **Post-comprehension failures** — failures humans literally cannot operationally observe or describe cannot be exhaustively enumerated here.
3. **Hardware/physical failures** — outside scope unless they manifest as observable AI failure mechanisms.

If you encounter a failure you believe is genuinely outside this structure, open an issue. That is not a problem — that is the point.

---

## Class ID Stability Guarantee

Class IDs are permanent. Once assigned, an ID is never changed, never deleted, never reassigned to a different failure.

- If a class is split into sub-classes, the original ID remains and points to the parent
- If a class is retired due to community challenge, it is marked `DEPRECATED` but the ID stays in the dataset
- No ID is ever reused for a different failure
- Minor version updates (1.x) never change IDs or remove classes
- Major version updates (x.0) may restructure dimensions but will publish a full migration table

This means: **you can safely encode class IDs in tooling, papers, and safety documentation today.** They will resolve correctly in future versions.


---

## Known Gaps and Classification Limits

**Failures the classifier handles well:**
- Described in terms of the failure mechanism (what structurally went wrong)
- Failures with documented real-world incidents
- Technical descriptions from safety papers

**Failures that may require browsing TAXONOMY.md directly:**
- Novel failure patterns not yet in the taxonomy
- Compound failures where the right class isn't obvious from a keyword search
- Failures described in domain-specific jargon (legal, medical, security) without crossover vocabulary

**Known classifier boundary cases:**
- Descriptions that are very short (< 10 words) may not provide enough signal
- Failures described entirely in abstract terms without concrete mechanism may miss
- The classifier was validated on English; non-English descriptions are untested

If the classifier returns NO on something you believe is a real failure, use semantic search (`scripts/semantic_search.py`) before concluding it's not in the table — the TF-IDF search is more robust to unusual phrasing.

---

## How to Challenge or Extend

1. Run the classifier or semantic search on the failure description
2. If it returns NO — document the description, the closest classes returned, and why you believe it represents a new mechanism
3. Open an issue with that documentation
4. The community evaluates: is it a new class, a compound of existing classes, or a sub-mode?

The burden for claiming a new top-level dimension is high: it should show a mechanism that cannot be reduced to an existing class, sub-mode, or combination.

---

## Contributing

This taxonomy lives or dies by community engagement. See [CONTRIBUTING.md](CONTRIBUTING.md) for the full process.

- **Found a failure outside the 343?** Open a `propose-new-class` issue — it's valuable evidence either way
- **Disagree with a classification?** Open a `challenge-classification` issue with your reasoning
- **Have a real incident to map?** Open a `report-real-incident` issue — real cases are gold
- **Classifier missing a case?** Open an `improve-keywords` issue

See [ROADMAP.md](ROADMAP.md) for where this project is headed.

---

## The Spec and the Brakes

The Periodic Table is the spec — a shared structural vocabulary for every known AI failure mechanism.

**[Agent Buccet](https://github.com/lml-layer-system/agent-buccet)** is the brakes — runtime enforcement built on top of this map. Where the Periodic Table names what can go wrong, Agent Buccet runs continuously at the application layer to detect and block it.

The table tells you which class a failure belongs to and what structural mechanism stops it. Its goal is to provide the "spec" for building effective "brakes" for AI, whether those brakes are implemented using Agent Buccet or your own custom solution. Agent Buccet is one such implementation, hardened for production. Same author. Same framework. Two layers of the same system.

---

## Relationship to Other Frameworks

Several serious efforts exist to categorize AI risk and failure. This project is complementary to all of them — not a replacement.

| Framework | Focus | Link |
|-----------|-------|------|
| MIT AI Risk Repository | Domain-level taxonomy (7 categories: Discrimination, Privacy, Misinformation, Malicious Actors, HCI, Socioeconomic, AI System Safety) | [airisk.mit.edu](https://airisk.mit.edu) |
| Project Glasswing | Frontier agentic cyber context: defensive coalitions, MCP semantic risk, skill-market supply chains, orchestration attacks — companion analysis in-repo; **official page + companion** text run through `classify_external_report.py` → [`reports/glasswing/`](reports/glasswing/) | [anthropic.com/glasswing](https://www.anthropic.com/glasswing) · [Analysis →](docs/project-glasswing.md) · [Live classify →](reports/glasswing/anthropic-glasswing-page-live-summary.md) |
| Agentic misalignment (insider threats) | Lynch et al. — simulated corporate agents (email/computer use): blackmail, espionage, eval-vs-real CoT sensitivity; section→class map; **live PDF → classifier** in [`reports/agentic-misalignment/`](reports/agentic-misalignment/) | [Companion →](docs/agentic-misalignment-insider-threats.md) · [Paper PDF →](https://arxiv.org/pdf/2510.05179) · [arXiv abs](https://arxiv.org/abs/2510.05179) · [Live classify →](reports/agentic-misalignment/lynch-et-al-2510-05179-live-summary.md) |
| Claude Opus 4.7 system card | Anthropic — RSP/CB/cyber/agentic/alignment/welfare disclosure; section→class map; Case 23; **live PDF → classifier** in [`reports/claude-opus-4-7/`](reports/claude-opus-4-7/) | [Companion →](docs/claude-opus-4-7-system-card.md) · [System card PDF →](https://www.anthropic.com/claude-opus-4-7-system-card) · [News](https://www.anthropic.com/news/claude-opus-4-7) · [Live classify →](reports/claude-opus-4-7/opus-4-7-system-card-live-summary.md) |
| Claude Mythos Preview system card | Anthropic — frontier capability disclosure (not GA); defensive-program framing; **live PDF → classifier** in [`reports/claude-mythos/`](reports/claude-mythos/) | [Companion →](docs/claude-mythos-system-card.md) · [System card PDF →](https://www.anthropic.com/claude-mythos-preview-system-card) · [Live classify →](reports/claude-mythos/claude-mythos-system-card-live-summary.md) |
| Meta integrity & adversarial reports (H1 2026) | Semiannual bundle (Mar 2026): Community Standards Enforcement, Widely Viewed Content, local-law restrictions, Oversight Board update; plus **H1 2026 Adversarial Threat Report** (Mar 11) — official Transparency Center URLs only | [Link hub →](docs/meta-integrity-reports-h1-2026.md) · [Integrity H1 2026 hub](https://transparency.meta.com/reports/integrity-reports-h1-2026/) · [Adversarial Threat H1 2026](https://transparency.meta.com/sr/first-half-2026-Adversarial-threat-report/) |
| Common Sense Media **Grok / xAI** assessment | Independent nonprofit **AI product ratings**; structured risk writeup for **Grok** (Jan 22, 2026); **live PDF → classifier** in [`reports/commonsense-grok/`](reports/commonsense-grok/) | [Assessment PDF →](https://www.commonsensemedia.org/sites/default/files/ai-ratings/csm-ai-risk-assessment-grok-01222026.pdf) · [AI ratings hub →](https://www.commonsensemedia.org/ai-ratings) · [Live classify →](reports/commonsense-grok/csm-ai-risk-assessment-grok-01222026-live-summary.md) |
| Microsoft Agentic AI Failure Taxonomy | Failure modes specific to autonomous agent systems | [Whitepaper (PDF)](https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/final/en-us/microsoft-brand/documents/Taxonomy-of-Failure-Mode-in-Agentic-AI-Systems-Whitepaper.pdf) |
| AI Incident Database / AVID | Real-world observed incidents, empirically collected | [avidml.org](https://avidml.org) |

The Periodic Table is mechanism-focused. Where MIT and Microsoft answer "what category is this?", the Periodic Table answers "exactly how does this failure occur, how do you detect it, and what structural property stops it?" Where AVID tracks what happened, the Periodic Table maps it to a named mechanism.

**Project Glasswing** is not a competing taxonomy: it situates the same failure mechanisms in the **orchestration layer** (tool protocols, agent scaffolds, permissions, and adversary campaigns at machine speed). Read the full narrative in [docs/project-glasswing.md](docs/project-glasswing.md). **Worked compound mapping:** [Case 21 in docs/case-studies.md](docs/case-studies.md) (threads → primary/secondary classes); the interactive table’s class modals include linked `case_studies` rows for the same narrative where applicable.

**Agentic misalignment (Lynch et al.)** is empirical red-team work on **goal preservation** and **insider-style exfiltration** in **controlled simulations**—mapped to the same mechanism classes (e.g. blackmail, shutdown resistance, data exfiltration, eval sensitivity). See [docs/agentic-misalignment-insider-threats.md](docs/agentic-misalignment-insider-threats.md) and [Case 22 in docs/case-studies.md](docs/case-studies.md).

**Claude Opus 4.7 system card** is **first-party** evaluation disclosure (agentic injection, sandbagging probes, eval-awareness, cyber/CB pathways, reward-hacking monitoring, destructiveness case studies). Mapped as [Case 23](docs/case-studies.md) with full TOC→ID table in [docs/claude-opus-4-7-system-card.md](docs/claude-opus-4-7-system-card.md).

**Meta (Facebook / Instagram)** publishes **integrity** and **adversarial threat** transparency reports on a semiannual cadence (from 2026). Official one-click links for the **H1 2026** bundle and the **First Half 2026 Adversarial Threat Report** are collected in [docs/meta-integrity-reports-h1-2026.md](docs/meta-integrity-reports-h1-2026.md).

**Common Sense Media** publishes independent **AI ratings** for families and educators; the **Grok / xAI** assessment (Jan 22, 2026) is a primary PDF run through the same `classify_external_report.py` pass as the vendor cards — see [`reports/commonsense-grok/`](reports/commonsense-grok/).

These frameworks are not in conflict. Use them together.

Where Periodic Table classes have verified mappings to MIT or Microsoft categories, those are recorded in the `mit_domain` and `ms_agentic_category` fields in the data.

---

## Enterprise Layer (Currently Building)

The Periodic Table is **permanently free** ([Apache License 2.0](#license)).

For teams that want **managed classification**, **continuous monitoring**, and **integrated dashboards**, we're building an **enterprise service layer**:

- **Managed API** — classify incidents without running your own infrastructure
- **Continuous monitoring** — auto-ingest safety disclosures from official feeds
- **Dashboard** — see your safety landscape and benchmark against industry patterns
- **Custom training** — help your team use the taxonomy operationally
- **Internal integrations** — connect to your existing workflows (ticketing, GRC, SIEM-shaped pipelines, etc.)

**For regulators** — classify any model disclosure, investigation notice, or transparency report against the full 343-class taxonomy in minutes. When a new system card drops or an incident is reported, run it through the same pipeline and get a structured evidence map — which failure classes are present, what the source text shows, and how it compares across the industry. Automated intake for model audits under the EU AI Act, UK GDPR enforcement, and any oversight regime that needs continuous structured evidence fast.

**The open commons stays free. A future enterprise layer would fund continued work and scale.**

We're **open to design partners** — developers, safety teams, labs — to shape what this should look like before anything is productized. **[Email ryangat@lmlsystemlayer.com](mailto:ryangat@lmlsystemlayer.com)** — same contact as [SECURITY.md](SECURITY.md) and the [citation block](#citation) below.

---

## About

Built by R. Gatoloai-Faupula — independent, no lab affiliation, no grant funding. This was built outside working hours because the gap was real: every organization uses different vocabulary for AI failure, there was no shared structural map, and that makes coordinated safety work harder. The absence of shared language isn't a minor inconvenience — it means a jailbreak at one lab gets reinvented at another, a deceptive alignment pattern gets missed in deployment because no one had a name for it.

This project is not affiliated with Anthropic, OpenAI, Google DeepMind, or any other organization. Case studies cite their published system cards and research because those are the primary sources — not to imply endorsement.

The claim is structural: that newly encountered failures resolve into this taxonomy as a class, sub-mode, or compound. That claim is falsifiable. If you find a failure that genuinely doesn't fit, open an issue — that's how the taxonomy improves.

---

## Citation

```
Gatoloai-Faupula, R. (2026). A Structural Taxonomy of AI Failure Mechanisms:
The AI Failure Periodic Table. Independent Research.
Contact: ryangat@lmlsystemlayer.com X: lml_layer
```

---

## License

Licensed under the **Apache License, Version 2.0**. See [`LICENSE`](LICENSE) for the full text. You may use, modify, and distribute this project under those terms.
