Metadata-Version: 2.4
Name: agent_sleuth
Version: 0.0.1
Summary: Prevents untrusted data from triggering consequential actions in your agent.
Project-URL: Homepage, https://github.com/Behuve-Labs/agent-sleuth
Project-URL: Source, https://github.com/Behuve-Labs/agent-sleuth
Project-URL: Issues, https://github.com/Behuve-Labs/agent-sleuth/issues
Author: Arnav Tripathy, Noah Wong
License: Copyright 2026 Behuve
        
        Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
License-File: LICENSE.md
Keywords: agent,ifc,llm,prompt-injection,security,taint
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Provides-Extra: agentdojo
Requires-Dist: agentdojo; extra == 'agentdojo'
Provides-Extra: config
Requires-Dist: pyyaml>=6.0; extra == 'config'
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: pyyaml>=6.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.1; extra == 'langchain'
Description-Content-Type: text/markdown

# Agent Sleuth

> **Prevents untrusted data from triggering consequential actions in your agent.**

Agent Sleuth is an in-process **information-flow-control (IFC)** library for LLM agents. It
stops untrusted data (web pages, email bodies, tool outputs, retrieved documents) from
driving **consequential actions** (sending email, writing files, posting to external
services).

The mechanism is **value-level provenance lineage tracked at the tool-I/O boundary** — *not*
taint-tracking through the model's forward pass. When an untrusted tool returns data, we
fingerprint the specific values in it. When a later consequential ("sink") call's arguments
carry those fingerprinted values — verbatim or via structured-field tracking — that is a
**deterministic, classifier-free provenance edge**. A small policy fires: untrusted-origin
value reaching a non-allowlisted external sink → **block or confirm**.

- **Deterministic, not a classifier.** The guarantee is a value-lineage match, never an LLM judging intent.
- **Zero extra LLM calls** on the common path.
- **Drop-in.** Three lines, zero changes to your agent.
- **Audit-mode first.** Observe for a week, then switch to enforce.

## Install

```bash
pip install agent_sleuth                 # core, zero agent-framework deps
pip install 'agent_sleuth[langchain]'    # + LangChain callback handler
pip install 'agent_sleuth[config]'       # + YAML config loading
pip install 'agent_sleuth[dev]'          # + pytest/ruff
```

## Three-line integration (raw / custom agent)

```python
from agent_sleuth import Sleuth

sleuth = Sleuth(
    untrusted=["read_email", "fetch_url", "search_web"],
    consequential=["send_email", "write_file", "post_slack"],
    destinations=["me@myco.com"],   # your own channels = trusted egress
    mode="audit",                   # → "enforce" once you trust it
)
sleuth.reset(query="summarize my emails and send a report to my boss")

# wrap your tools (or pass sleuth.handler to a LangChain agent — see below)
fetch_url  = sleuth.track(fetch_url)
send_email = sleuth.track(send_email)

# ... run your agent ...
print(sleuth.report())
```

You can also skip the explicit lists entirely — `Sleuth()` uses **name-based defaults**
(tools containing `read/fetch/search/get/...` are untrusted; `send/write/post/delete/...`
are consequential).

## LangChain (zero changes to your agent)

```python
from agent_sleuth import Sleuth

sleuth = Sleuth(agent=your_langchain_agent, mode="audit")
result = sleuth.run("summarize my emails and send a report to my boss")
print(sleuth.report())
```

`Sleuth.run()` resets taint state, stashes the trusted query, and attaches the
`IFCCallbackHandler` to your agent — no edits to your chain.

## What a caught attack looks like

```
BLOCKED: send_email() called with tainted inputs
  Taint source: fetch_url (step 2, untrusted)
  Injected value detected in argument: to="attacker@evil.com"
  Lineage: fetch_url (step 2) → value "attacker@evil.com" → send_email.to
  Destination: attacker@evil.com (not allowlisted)
  Reason: untrusted-origin value reached a consequential sink
  Action: blocked, call halted
```

## Modes

- `audit` (default): detect + log + render the trace; **never block**.
- `enforce`: raise `TaintViolationError` and halt the offending sink call.
- `confirm`: surface the violation to a callback for an allow/deny decision before dispatch.

## Honest coverage envelope (v0)

> Sound on the verbatim/structured-exfil class. Zero extra LLM calls on the common path.
> Drop-in. **Laundering** (base64/paraphrase of a secret) and **pure control-flow hijack**
> (a sink call whose arguments carry no untrusted bytes) are explicitly **out of scope for
> v0** — documented non-goals, not bugs. Control-flow integrity (the plan-allowlist) and a
> configurable allow/denylist with deny-over-allow precedence land in v1.

| Attack class | v0 |
|---|---|
| Verbatim exfiltration (untrusted value appears literally in sink arg) | ✅ deterministic |
| Structured exfiltration (untrusted field → sink field) | ✅ deterministic |
| Legit egress to your own channel (destination allowlist) | ✅ allowed (no false positive) |
| Control-flow hijack (out-of-plan sink, no untrusted bytes) | ❌ v1 (plan-allowlist) |
| Laundering (base64 / paraphrase / transform) | ❌ v2+ (opt-in quarantine) |

## Benchmark

```bash
PYTHONPATH=. python benchmarks/agentdojo/run.py
```

A self-contained reproduction of AgentDojo-style indirect-injection tasks (real AgentDojo
needs a live LLM + API keys; see the harness docstring for the thin real-AgentDojo wiring).
Reports ASR (attack success rate) and utility per mode.

## Develop

```bash
pip install -e '.[dev,langchain,config]'
pytest
```

See `AGENT_SLEUTH_ARCHITECTURE.MD` for the full design.
