Sensitive Data Detection¶
SensitiveDataDetector scans text for PHI, PII, and credentials, returning rich match objects so callers can audit what was found rather than just receiving a scrubbed string.
from meshflow.security.sensitive_data import SensitiveDataDetector, get_detector
detector = SensitiveDataDetector()
matches = detector.detect("Patient John Smith SSN: 123-45-6789")
masked = detector.mask("John Smith 123-45-6789")
report = detector.audit_report("John Smith SSN: 123-45-6789")
SensitiveMatch¶
Each match returned by detect() is a SensitiveMatch dataclass:
| Field | Type | Description |
|---|---|---|
kind |
str |
Pattern label, e.g. "SSN", "EMAIL", "JWT" |
category |
str |
"phi" | "pii" | "credential" |
value_preview |
str |
First 6 chars + "…" — never the full value |
start |
int |
Character offset in source text |
end |
int |
Exclusive end offset |
confidence |
float |
1.0 for regex patterns, 0.7 for heuristics |
for m in matches:
print(m.kind, m.category, m.value_preview, m.confidence)
d = m.to_dict() # JSON-serialisable dict
SensitiveDataDetector¶
SensitiveDataDetector(
phi_enabled: bool = True,
credential_enabled: bool = True,
min_confidence: float = 0.6,
)
Pattern coverage¶
PHI / PII (11 patterns)
| Kind | Category | Example match |
|---|---|---|
SSN |
phi |
123-45-6789 |
EMAIL |
pii |
user@example.com |
PHONE |
pii |
(555) 867-5309 |
DATE |
pii |
Jan 15, 2024 |
ZIP |
pii |
94105-1234 |
IP |
pii |
192.168.1.1 |
URL |
pii |
https://internal.example.com |
MRN |
phi |
MRN: A-12345 |
NPI |
phi |
NPI number (10 digits) |
CREDIT_CARD |
pii |
4111 1111 1111 1111 |
NAME |
pii |
John Smith (confidence 0.7) |
Credentials (12 patterns)
| Kind | Example prefix |
|---|---|
API_KEY_ANTHROPIC |
sk-ant-… |
API_KEY_OPENAI |
sk- (48 chars) |
API_KEY_GENERIC |
api_key=… |
AWS_ACCESS_KEY |
AKIA… |
AWS_SECRET_KEY |
aws_secret=… |
GITHUB_TOKEN |
ghp_…, gho_…, ghs_… |
JWT |
eyJ… |
PRIVATE_KEY |
-----BEGIN RSA… |
DB_CONN_STRING |
postgresql://user:pass@ |
HIGH_ENTROPY_HEX |
40+ hex chars (conf 0.6) |
GCP_KEY |
"type": "service_account" |
BEARER_TOKEN |
Bearer <token> |
Methods¶
# Returns list[SensitiveMatch], ordered by position
matches = detector.detect(text)
# Returns text with all matches replaced
# PHI/PII → "[REDACTED]" credentials → "[CREDENTIAL-REDACTED]"
safe = detector.mask(text)
# Quick boolean checks
detector.has_credentials(text) # True if any credential pattern fires
detector.has_phi(text) # True if any PHI/PII pattern fires
# Compliance-ready summary dict
report = detector.audit_report(text)
# {
# "total_matches": 2,
# "has_phi": True, "has_pii": False, "has_credentials": False,
# "kinds_found": ["EMAIL", "SSN"],
# "by_category": {"phi": ["SSN"]},
# "high_confidence_matches": [...]
# }
Global singleton¶
from meshflow.security.sensitive_data import get_detector
detector = get_detector() # cached SensitiveDataDetector()
reset_detector() clears the singleton — useful in tests.
PIIBlockGuardrail¶
The guardrail wraps SensitiveDataDetector into the GuardrailStack:
from meshflow.security.guardrails import PIIBlockGuardrail
from meshflow.agents.builder import Agent
agent = Agent(
name="hipaa-agent",
role="researcher",
input_guardrails=[PIIBlockGuardrail(action="block")],
output_guardrails=[PIIBlockGuardrail(action="modify")], # mask instead of block
)
| Action | Behaviour |
|---|---|
"block" |
GuardrailResult(passed=False) when PII found |
"modify" |
Returns masked text; call succeeds |
"warn" |
Passes with match count in metadata |