How It Works

4-Layer Deep Defense

What gets past one wall gets caught by the next. Independent layers — no single point of failure.
User Input
Wall 1
Pattern Matching
165+ regex rules across 25 threat categories
SQLiXSSPIIJailbreak
Wall 2
Semantic Similarity
Compares against corpus of known attack phrases
ParaphrasedNovel phrasing
Wall 3
Encoded Payload
Decodes Base64, hex, URL, ROT13, then re-scans
ObfuscatedEncodedUnicode
Wall 4
Multi-Turn Analysis
Tracks escalation across conversation turns
Slow-burnEscalation
Safe Output
L4
Capability Access Control
CaMeL-inspired taint tracking. Untrusted data can't trigger privileged tools.
L5
Atomic Execution Pipeline
Run agent actions in a sealed sandbox, destroy all traces after.
L6
Safety Specification Verifier
Formal safety specs with proof-certificate verification.