Metadata-Version: 2.4
Name: tessen
Version: 0.3.0
Summary: The harness for your AI agents. One line to install. Captures every model call, maps your codebase, generates frame-aware PR fixes — without your source ever leaving your machine.
Author-email: Tessen <hi@tessen.dev>
License-Expression: MIT
Project-URL: Homepage, https://tessen.dev
Keywords: agent,harness,ai-agent,llm,tracing,anthropic,openai,gemini,cohere,mistral,ai-agent,debugging
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Debuggers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Logging
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.40.0; extra == "anthropic"
Provides-Extra: openai
Requires-Dist: openai>=2.36.0; extra == "openai"
Provides-Extra: google
Requires-Dist: google-genai>=1.2.0; extra == "google"
Requires-Dist: google-generativeai>=0.8.6; extra == "google"
Provides-Extra: cohere
Requires-Dist: cohere>=6.1.0; extra == "cohere"
Provides-Extra: mistral
Requires-Dist: mistralai>=1.0.0; extra == "mistral"
Provides-Extra: viewer
Requires-Dist: rich>=13.0; extra == "viewer"
Provides-Extra: all
Requires-Dist: anthropic>=0.40.0; extra == "all"
Requires-Dist: openai>=2.36.0; extra == "all"
Requires-Dist: google-genai>=1.2.0; extra == "all"
Requires-Dist: google-generativeai>=0.8.6; extra == "all"
Requires-Dist: cohere>=6.1.0; extra == "all"
Requires-Dist: mistralai>=1.0.0; extra == "all"
Requires-Dist: rich>=13.0; extra == "all"
Dynamic: license-file

# tessen

**The harness for your AI agents.** One line to install. Captures every model call, every tool use, every retry. Maps your codebase. Learns your team's commit style. Generates frame-aware PR fixes — without your source ever leaving your machine.

> **Tame the beast in your Agentic Workflows. Use Tessen.**

```python
import tessen
tessen.init()
```

That's the entire integration. Drop it before you construct your LLM client.

## Why

It's 3am. You're paged. Your agent burned $2,000 in API spend in two hours, looping on the same broken tool 47 times because the SDK swallowed a 502 and the agent silently retried. Your dashboard says the request succeeded. Your traces show one span. Your logs are noise.

You cannot defend what you cannot see — and the tools that watch your agent treat each model call like a web request instead of like a program that *thinks*. Tessen treats it like a program. Every thinking block, every tool call, every cache decision, every retry — recorded structurally, in your process, on your disk. When the page comes in, you have the receipts.

## Install

```bash
pip install tessen
# or, with the optional log viewer
pip install "tessen[viewer]"
```

Core install has **zero hard dependencies**. Tessen patches the vendor SDKs that are already importable in your process — Anthropic, OpenAI, Google Gemini, Cohere, Mistral — without forcing any of them on you.

## Overhead

**Well under 1 ms per captured call** — measured. Tessen sits between your code and the vendor's network round-trip (which is hundreds of milliseconds to several seconds), so the overhead is invisible. Run `tessen bench` to verify on your machine. The regression test fails the build if mean per-event overhead crosses 1 ms.

Typical numbers (Apple Silicon, APFS): ~75 µs mean, ~105 µs p99. The dominant cost is `fsync()` on disk write — overridden via `TESSEN_DISABLE_FSYNC=1` if you want lower latency at the cost of crash durability.

## What runs locally, free, today

### Capture (always on)
```python
import tessen
tessen.init()
```
Every model call from your agent — full request, full response with thinking blocks decoded, cache-token usage, call-site, timing — written as JSONL to `~/.tessen/logs/<agent>/<date>.jsonl`. Trunk-patched across all 5 vendors: structurally vendor-rename-resistant.

### See what just happened
```bash
tessen status            # which agents are active, errors, anomalies — at a glance
tessen tail my_agent     # `tail -f` for live events
tessen viewer my_agent   # per-event forensic detail
```
Per-event inspector. 🚨 markers on anomalies (silent retries, truncated responses, refused requests, cache-misses, malformed tool calls). HTTP status codes, vendor error types, request/response content blocks. The forensic record.

### Aggregate findings
```bash
tessen analyze my_agent
```
Rolls up your captured events into a leadership-readable report:
```
## findings
  🚨 12 ERROR event(s)
     • 8× RateLimitError (429)
     • 4× NotFoundError (404)
  ⚠ 47 event(s) flagged with anomalies:
     • 34× cache_control_set_but_no_cache_activity
     • 12× truncated_response
     • 1× thinking_only_no_output

## cost
  total: $312.40 last 7 days
  claude-sonnet-4-6:  $280.10
  claude-haiku-4-5:   $32.30

## latency (ms): p50=842  p95=4012  p99=14290
```

### Map your codebase
```bash
tessen crawl /path/to/your/repo
```
Deterministic AST scan. Finds every call site to every supported vendor SDK, classifies the surrounding function via named structural rules (`tool_use_loop_anthropic`, `framework_wrapped:langgraph`, etc.), extracts per-function docstrings, comments, body hashes, and in-file callers. Stdlib-only. Sub-second on most repos.

## What activates with `TESSEN_API_KEY`

When you set `TESSEN_API_KEY` in your environment, tessen.init() auto-activates:

```bash
export TESSEN_API_KEY=tk_live_...
```

```python
tessen.init()   # same one line — but now ships to the hosted backend
```

What ships:
- **Events** — your captured runtime stream
- **Manifest** — your codebase map (the output of `tessen.crawl`)
- **Git style fingerprint** — your team's commit/test/style conventions, **author emails SHA-256 hashed**, **no commit messages or diffs leave your machine**

What never ships:
- Your source code
- Your commit messages
- Your diffs
- Your author names or emails

The hosted backend uses these signals to generate **frame-aware PRs** that fix detected agent fragility — matching your team's commit style, test discipline, library preferences, and file co-change patterns. PR specs come back via `tessen.apply`, which reads your local source and produces the actual diff:

```bash
tessen apply spec.json --open-pr
```

**Your source never leaves your machine.** The hosted backend works from the manifest + events + style fingerprint; the local applier produces the diff. That's the unfakeable privacy claim.

## What gets captured per event

Every event carries this base shape — verified by regression test against the actual captured JSONL:

```jsonc
{
  "event_id": "...",
  "session_id": "...",
  "agent_name": "...",
  "ts": 1715688000.594,                   // unix timestamp (seconds, float)
  "_tessen_tier": 1,                      // 1=trunk, 2=discovery, 3=http-floor
  "provider": "anthropic",
  "surface": "messages.create",
  "call_type": "request",
  "http_method": "POST",
  "http_url": "https://api.anthropic.com/v1/messages",
  "call_site": {"file": "agent.py", "line": 101, "func": "run"},
  "duration_ms": 842.1,
  "request_body": { /* full SDK-serialized JSON */ },
  "streamed": false,
  "status": "ok",                         // "ok" | "error"
  "quality_signals": ["rate_limited"]     // anomaly tags surfaced at write time
}
```

The response shape depends on **what happened**:

| event variant | response fields present |
|---|---|
| **Successful non-streaming call** | `casted_response` (pydantic dump with parsed content blocks), `response_body` (raw JSON), `response_headers` (request-id, rate-limit, x-should-retry) |
| **Successful streaming call** | `response_body` (reassembled from chunks), `chunks_captured` (count) |
| **Error (HTTP or client-side)** | `error` (object with `type`, `msg`, `tb`, `status_code`, `body`, `response_headers`) |

So a customer code path that always wants the structured response should fall back: `event.get("casted_response") or event.get("response_body") or event.get("error", {}).get("body")`. The `tessen viewer` and `tessen analyze` CLIs do this for you.

The trunk-patching architecture means:
- **Anthropic SDK rename?** Captured anyway.
- **OpenAI changes their resource paths?** Captured anyway.
- **Vendor adds a new method we haven't patched?** Captured anyway.

Only a base-client rename breaks Tier 1 — at which point Tier 3 (HTTP-layer floor) keeps capturing every model call regardless. Three-layer architecture, structurally hardened.

## Anomaly tags

Tessen tags events at write-time with `quality_signals` so the analyzer can lead with what's actionable. Filter on any tag via `tessen.events.filter_events(events, quality_signal="...")` or `tessen analyze`'s findings section.

**Content-level (Anthropic / OpenAI / Gemini)**
- `empty_content_on_end_turn` — model finished with `stop_reason: end_turn` but returned no content. Common silent-failure mode.
- `empty_content_on_stop` — OpenAI / Mistral equivalent (`finish_reason: stop` with empty `message.content`).
- `truncated_response` — `stop_reason: max_tokens` / `length`. The customer's `max_tokens` was hit; agent saw a cut-off response.
- `thinking_only_no_output` — Anthropic extended-thinking event with thinking blocks but zero text/tool_use output.
- `refusal` — model returned a `refusal` block (Anthropic / OpenAI policy decline).
- `invalid_tool_use_json` — Anthropic `tool_use.input` was malformed JSON in the streaming reassembly.
- `invalid_tool_call_json` — OpenAI `tool_calls[].function.arguments` was malformed JSON.

**Cache-control (Anthropic prompt caching)**
- `cache_control_set_but_no_cache_activity` — request set `cache_control: ephemeral` but response showed zero `cache_read_input_tokens` / `cache_creation_input_tokens`. Prompt likely below the 1024-token minimum, OR cache TTL expired.

**HTTP-error categorization (every vendor)**
- `rate_limited` (429), `overloaded` (529), `auth_error` (401), `permission_denied` (403), `not_found` (404), `conflict` (409), `unprocessable` (422), `invalid_request` (400), `connection_error` (network failure pre-status), `server_error` (5xx other), `client_error` (4xx other).

19 tags total, evaluated in order: HTTP errors first (since they're most product-relevant), then content-level anomalies, then cache. Empty `quality_signals` array means: clean event.

## Configuration

Tessen reads everything from env vars. The Python API stays one line.

| variable | default | purpose |
|---|---|---|
| `TESSEN_API_KEY` | unset → local-only | Activate hosted streaming when set |
| `TESSEN_AGENT_NAME` | inferred from module | Override agent identity |
| `TESSEN_LOG_DIR` | `~/.tessen/logs` | Where local JSONL lands |
| `TESSEN_INGEST_URL` | `https://api.tessen.dev/v1/ingest` | Override for self-hosted |
| `TESSEN_DISABLE_CRAWL` | unset | `=1` opts out of auto-crawl |
| `TESSEN_DISABLE_GIT_LEARN` | unset | `=1` opts out of auto git fingerprint |
| `TESSEN_LEAF_FALLBACK` | unset | `=1` reverts to legacy leaf-patching (compat escape hatch) |
| `TESSEN_DISABLE_FSYNC` | unset → fsync after every write | `=1` skips `fsync()` per event — lower latency, but loses up to one buffered batch on power loss |

**Shipper tuning** (only matters when `TESSEN_API_KEY` is set):

| variable | default | purpose |
|---|---|---|
| `TESSEN_FLUSH_INTERVAL_SEC` | `5.0` | Max seconds to wait before flushing a partial batch |
| `TESSEN_BATCH_MAX_EVENTS` | `64` | Max events per POST |
| `TESSEN_QUEUE_MAX` | `1024` | Bounded queue size; drop-oldest on overflow |
| `TESSEN_RETRY_MAX` | `3` | Retries per batch with exponential backoff |
| `TESSEN_HTTP_TIMEOUT_SEC` | `10.0` | Per-request HTTP timeout |

**Robustness:** every `TESSEN_*` env var is whitespace-tolerant. Boolean flags accept `1` / `true` / `yes` / `on` case-insensitively; numeric tuning vars fall back to defaults for unparseable input. A typo or accidental `$UNSET_VAR` expansion never crashes `import tessen`.

Power-user overrides exist as kwargs on `tessen.init()`. The 99% case is one function call, no args.

## Vendor coverage

| vendor | sync | async | streaming (ctx-mgr) | streaming (iterator) | tool-use |
|---|---|---|---|---|---|
| Anthropic | ✅ | ✅ | ✅ | ✅ | ✅ |
| OpenAI (chat.completions) | ✅ | ✅ | ✅ | ✅ | ✅ |
| OpenAI (responses API) | ✅ | ✅ | ✅ | ✅ | ✅ |
| Google Gemini (`google.genai`) | ✅ | ✅ | ✅ | ✅ | ✅ |
| Cohere | ✅ | ✅ | ✅ | ✅ | ✅ |
| Mistral | ✅ | ✅ | ✅ | ✅ | ✅ |

Framework wrappers (LangChain `ChatAnthropic`, LangGraph nodes, deepagents, OpenAI Agents SDK) all flow through trunk-patching automatically — no per-framework code in tessen.

## CLIs at a glance

After `pip install tessen`, the `tessen` command is on your PATH. Every subcommand also works as `python -m tessen.<name>` if you prefer.

```bash
tessen status            # glanceable overview of every captured agent
tessen tail <agent>      # `tail -f` for live events (`-n 5` for backfill)
tessen viewer <agent>    # per-event forensic detail with anomaly markers
tessen analyze <agent>   # aggregate findings — errors, anomalies, cost, latency
tessen compare a b       # side-by-side diff of two events or sessions
tessen upload <agent>    # manually ship local JSONL to the hosted backend
tessen crawl <repo>      # deterministic AST codebase map
tessen repo-learn <repo> # privacy-preserving git style fingerprint
tessen apply <spec.json> # apply a PR spec from the hosted backend
tessen doctor            # diagnose your integration (env vars, vendor SDKs)
tessen doctor --ping     # also probe TESSEN_INGEST_URL reachability
tessen bench             # measure per-event capture overhead
tessen version           # print version
```

`tessen status`, `tessen tail`, `tessen viewer`, and `tessen analyze` all accept a bare agent name and resolve it under `$TESSEN_LOG_DIR` (default `~/.tessen/logs`).

## Programmatic Python API

For when the CLI isn't the right surface — building your own dashboard, integrating into a notebook, or running tessen's analysis as a step in your own pipeline:

```python
import tessen.events as ev

# Iterate every captured event for an agent
for event in ev.read("my_agent"):
    print(event["event_id"], event["status"], event.get("quality_signals"))

# Get the last N events (handy in notebooks)
last_50 = ev.recent("my_agent", n=50)

# Filter — composable, all kwargs optional
errors_only = ev.filter_events(last_50, status="error")
rate_limited = ev.filter_events(last_50, quality_signal="rate_limited")
in_session = ev.filter_events(last_50, session_id="sess_abc")

# Aggregate — same rollup shape as `tessen analyze --json`. Carries the
# `schema_version` / `tessen_version` / `generated_at` envelope so
# scripts can branch on version changes and cache by timestamp.
report = ev.aggregate(last_50)
print(report["total_cost_usd"], report["error_events"], report["anomaly_counts"])
print(report["schema_version"])  # 1 for tessen 0.3.x

# Time-bounded reads — accepts unix float, tz-aware datetime, timedelta,
# or a duration string like "24h"/"7d"/"2w" (same shorthand as `tessen
# tail --since`).
from datetime import timedelta
recent_24h = list(ev.read("my_agent", since=timedelta(hours=24)))
recent_24h = list(ev.read("my_agent", since="24h"))  # equivalent

# Diagnostic: opt-in stats dict captures corrupt-line counts so you know
# if a writer crash left torn JSONL.
stats: dict = {}
events = list(ev.read("my_agent", stats=stats))
if stats.get("corrupt_lines"):
    print(f"⚠ {stats['corrupt_lines']} torn lines skipped")

# Discover agents (defaults to only ones with captures; include_empty=True
# for raw filesystem truth)
agents = ev.list_agents()
```

All these functions also accept `log_dir=` to override the default location.

## License

MIT. The SDK is open-source. The hosted backend (analyzer + PR generation) is a separate commercial service — see `tessen.dev`.

## Contact

`hi@tessen.dev` · [tessen.dev](https://tessen.dev)
