Skip to content

Guardrails

Every agent input and output passes through a guardrail stack before reaching the LLM or the caller — this is enforced at the MeshFlow kernel level, not bolted on after the fact.

from meshflow import Agent
from meshflow.security.guardrails import PIIBlockGuardrail, ConfidenceGuardrail, LengthGuardrail

agent = Agent(
    name="hipaa_agent",
    role="researcher",
    input_guardrails=[PIIBlockGuardrail(action="block")],
    output_guardrails=[
        ConfidenceGuardrail(min_confidence=0.70),
        LengthGuardrail(max_chars=4000),
    ],
)

Core Types

GuardrailResult

@dataclass
class GuardrailResult:
    passed: bool
    guardrail_name: str
    reason: str = ""
    modified_text: str | None = None   # set when action="modify"
    severity: Literal["block", "warn", "modify"] = "block"
    metadata: dict[str, Any] = field(default_factory=dict)

GuardrailViolation

Raised by GuardrailStack in strict mode when a blocking guardrail fails:

class GuardrailViolation(Exception):
    result: GuardrailResult   # access via .result.guardrail_name, .result.reason

GuardrailStack

stack = GuardrailStack(
    guardrails=[PIIBlockGuardrail(), ToxicityGuardrail()],
    mode="strict",   # "strict" | "collect"
)
passed, final_text, results = stack.run("Hello, SSN is 123-45-6789")
# passed=False, results[0].guardrail_name="pii_block"

stack.run() returns (all_passed: bool, final_text: str, results: list[GuardrailResult]).

In collect mode all guardrails run and failures are returned in results without raising. In strict mode the first blocking failure raises GuardrailViolation.


Built-in Guardrails

PIIBlockGuardrail

Detects PHI/PII via SensitiveDataDetector. Use action="modify" to mask instead of block.

PIIBlockGuardrail(
    action="block",          # "block" | "warn" | "modify"
    categories=["ssn", "email"],   # None = all categories
    min_confidence=0.5,
    name="pii_block",
)

With action="modify", GuardrailResult.modified_text contains the masked text with values replaced by [REDACTED].

ConfidenceGuardrail

Blocks outputs whose CONFIDENCE:0.XX marker falls below a threshold.

ConfidenceGuardrail(
    min_confidence=0.70,
    missing_ok=True,   # pass if no CONFIDENCE marker found
    action="block",
)

LengthGuardrail

Enforces minimum/maximum text length.

LengthGuardrail(
    min_chars=10,
    max_chars=4000,
    unit="chars",   # "chars" | "words"
    action="block",
)

ToxicityGuardrail

Blocks profanity, violence, self-harm, and hate content. Built-in categories: profanity, violence, self_harm, hate.

ToxicityGuardrail(
    categories=["violence", "self_harm"],   # None = all
    extra_patterns=[r"\bforbidden_term\b"],
    case_sensitive=False,
    action="block",
)

JSONSchemaGuardrail

Validates that the output is parseable JSON, optionally conforming to a schema. Extracts JSON from markdown fences automatically.

JSONSchemaGuardrail(
    schema={
        "required": ["diagnosis", "icd_code"],
        "properties": {"icd_code": {"type": "string"}},
    },
    extract_json=True,   # strip ```json ... ``` fences
    action="block",
)

Full JSON Schema draft-7 validation requires jsonschema; otherwise falls back to required-key presence checking.

RegexGuardrail

Requires or forbids a regex pattern.

RegexGuardrail(pattern=r"ICD-\d{2}", mode="require", action="block")
RegexGuardrail(pattern=r"\bpassword\b", mode="forbid", action="block")

mode="require" fails if pattern is absent. mode="forbid" fails if pattern is present.

KeywordBlockGuardrail

Blocks any text containing forbidden keywords or phrases.

KeywordBlockGuardrail(
    keywords=["internal only", "confidential", "trade secret"],
    whole_word=True,       # match whole words only
    case_sensitive=False,
    action="block",
)

CostCapGuardrail

Rejects input tasks whose estimated token cost exceeds a budget. Apply to input_guardrails to catch runaway prompts before they hit the LLM.

CostCapGuardrail(
    max_cost_usd=0.10,
    input_rate_per_1k=0.003,   # USD per 1k tokens (claude-sonnet default)
    chars_per_token=4,
    action="block",
)

CustomGuardrail

Wraps any callable as a guardrail.

def check_disclaimer(text: str) -> tuple[bool, str]:
    if "NOT MEDICAL ADVICE" not in text:
        return False, "missing required disclaimer"
    return True, ""

agent = Agent(
    name="medical_agent",
    output_guardrails=[CustomGuardrail(fn=check_disclaimer, name="disclaimer_check")],
)

Callable signatures accepted: bool, tuple[bool, str], or tuple[bool, str, str] (the third element is modified text for action="modify").


Stack Modes

Mode Behavior
strict Raises GuardrailViolation on first blocking failure
collect Runs all guardrails; accumulates failures in results; never raises

input_guardrails vs output_guardrails

input_guardrails run on the task string before the LLM is called. Use them to block PII ingestion and catch cost overruns early.

output_guardrails run on the LLM response before it is returned to the caller. Use them to enforce confidence, schema, and content requirements.

Both accept the same list of Guardrail instances.