Metadata-Version: 2.4
Name: looptrip
Version: 0.1.1
Summary: Deterministic, framework-agnostic detector of multi-agent coordination pathologies that trips at iteration 2
Author-email: Edward Kubiak <edward.kubiak.dev@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/ek33450505/looptrip
Project-URL: Repository, https://github.com/ek33450505/looptrip
Project-URL: Issues, https://github.com/ek33450505/looptrip/issues
Project-URL: Changelog, https://github.com/ek33450505/looptrip/blob/main/CHANGELOG.md
Project-URL: Documentation, https://github.com/ek33450505/looptrip/tree/main/docs
Keywords: multi-agent,observability,opentelemetry,agents,llm,deadlock,livelock,coordination,detection
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Provides-Extra: otel
Requires-Dist: opentelemetry-api; extra == "otel"
Requires-Dist: opentelemetry-sdk; extra == "otel"
Requires-Dist: opentelemetry-semantic-conventions; extra == "otel"
Dynamic: license-file

# looptrip

**Deterministic, framework-agnostic detection of multi-agent coordination pathologies — caught at iteration 2, not on the invoice.**

looptrip watches a multi-agent run as a stream of normalized events and flags the coordination pathologies that make agent systems burn money and spin: duplicate-work loops, ping-pong / livelock, deadlock, and non-termination. It is **detection-first** — it works over data you already have (OpenTelemetry GenAI spans, or a CAST `cast.db`) — and **deterministic / zero-LLM**: the same event stream always yields the same verdict. looptrip is an **observer, never a gate**; it reports, it never blocks.

> **This release (0.1.1)** ships full pathology coverage (duplicate-work, ping-pong / livelock, deadlock, non-termination), configurable sensitivity controls, counterfactual-replay attribution (via the `attribute` subcommand), and the `cast.db` adapter with reproducible proof on real data — plus **OpenTelemetry support** (Phase 4): an offline adapter (`OTelSpanAdapter` from flat span dicts and OTLP/JSON exports) plus a live `LooptripSpanProcessor` for in-flight detection, available in the `looptrip[otel]` extra. (0.1.0 was published before the Phase 4 code merged and shipped without the OTel modules; 0.1.1 is the first artifact to actually include them.)

## The headline

On two **real** recorded multi-agent runaway sessions, a single `workflow-subagent` dispatch recurred 54 and 49 times with no progress between repeats. Tripping at the *second* dispatch — the first repeat — instead of letting the loop run to exhaustion would have saved:

| session | runaway loop | dispatches | trip point | saved |
|---|---|---|---|---|
| `2e6c0288` | `workflow-subagent` | 54 | dispatch #2 | **$320.16** |
| `da27b414` | `workflow-subagent` | 49 | dispatch #2 | **$472.80** |
| | | | **total** | **$792.96** |

Reproduce it yourself — no database required, the data is a committed fixture:

```sh
pip install -e .
looptrip proof
```

## Why "iteration 2"

Native runaway guards are blunt total-step counters that trip at N=10–25 — *after* the waste has compounded. looptrip's trip is a **safety predicate keyed on the pathology signature**: *no signature `(agent, tool, args_hash)` may recur without an intervening progress delta.* The instant a signature is seen a second time (within a configurable input-token tolerance, with no progress marker between), it fires — before the third wasted turn and the O(N²) context-cost compounding. "2" is the default threshold, not a magic number. The approach (signature-keyed detection with configurable thresholds) is what matters — the detector itself is not the moat; the durable asset is standards *engagement* — adopting the upstream OpenTelemetry GenAI `gen_ai.agent.handoff.*` convention and contributing the agent-loop pathology layer (pending-wait and loop-termination semantics) that looptrip uniquely detects.

The worst real runaways are the hardest to catch: a `workflow-subagent` loop emits no structured handoff contract at all. So looptrip detects from the `(agent, ts)` repeat signal plus input-token variance alone; any handoff metadata only *enriches* the signal — it is never required.

## Usage

```sh
looptrip proof                                  # reproduce the $792.96 headline on the committed fixture
looptrip scan fixture:<session_id>              # scan a session from the packaged fixture
looptrip scan cast-db:<session_id>              # scan a live cast.db session (CAST hosts only)
looptrip scan --all fixture:<session_id>        # run all four detectors (adds a 'kind' column)
looptrip attribute fixture:<session_id>         # attribute pathologies to decisive handoffs (overdetermined = no single one)
looptrip --version
```

## How it works

1. **One normalized event** — `(agent, tool, args_hash, ts, handoff_state)` plus optional cost/token metadata. An **adapter** maps each source's fields onto this schema, so detection logic never touches source-specific span-attribute renames.
2. **Detection-first** — Phase 1 ships a `cast.db` adapter. Phase 4 (now shipped) adds an offline OTel adapter (`OTelSpanAdapter` ingesting flat span dicts and OTLP/JSON/JSONL exports) and a live `LooptripSpanProcessor` for in-flight pathology detection in the `looptrip[otel]` extra. Because `agent_runs` carries no per-dispatch args, the adapter sets `args_hash=None` and detection leans on the token-variance signal.
3. **Stdlib state machine** — the detector groups events by signature and trips on the 2nd same-signature occurrence with no progress delta. The core is **stdlib-only**; OpenTelemetry is an optional `[otel]` extra, never imported by the detector.
4. **False-positive control is first-class** — a configurable input-token tolerance, a progress-delta marker, and an `idempotent_agents` allowlist keep legitimately-repeatable work (commits, reviews) from tripping. looptrip is meant to be run detect-then-print and dogfooded before any signal is trusted.

## Honest framing

This project tries hard not to oversell:

- **Attribution numbers.** Published LLM-prompting baselines for "which handoff broke the run" sit around ~14% — but that is the *prompting* baseline; structured / deterministic methods reach 29–52%. Adding structure is the lever, and looptrip's deterministic replay (Phase 3) is the limit case of that frontier — not a fix for a permanent ceiling. We don't anchor to "14%."
- **Cost numbers.** The $792.96 here is verifiable from the committed fixture. Larger figures circulate — e.g. a widely-reported "$47K" agent-loop bill — but those are **unverified**, and we label them as such.
- **Prior art.** The market gap is real, but the durable asset is the *standard*, not the ~200-line detector. A direct competitor exists — **Watchtower** (MIT, LangGraph-only, trips at 3+ repeats, no handoff contract, no attribution). looptrip differentiates on framework-agnosticism, speed, and standards engagement with the OpenTelemetry GenAI agent-observability conventions — adopting the upstream `gen_ai.agent.handoff.*` handoff identity and contributing the agent-loop pathology layer it uniquely detects.

## Roadmap

- **Phase 1** — `cast.db` adapter + duplicate-work / iteration-2 detector + reproducible proof.
- **Phase 2** — full pathology coverage (ping-pong / livelock, deadlock, non-termination) + sensitivity controls.
- **Phase 3** — counterfactual replay attribution ("which handoff was decisive").
- **Phase 4** — **(SHIPPED)** OpenTelemetry support: offline adapter (`OTelSpanAdapter` ingesting flat span dicts, OTLP/JSON, JSONL exports) and live `LooptripSpanProcessor` (in-flight pathology detection via `on_start` hooks, thread-safe, deduped) in the `looptrip[otel]` extra. Unit and synthetic testing complete; production multi-agent validation pending.
- **Phase 5** — packaging (Claude Code plugin, Homebrew).
- **Phase 6** — documentation (reference deep-dives, examples, architecture notes).
- **Phase 7** — OpenTelemetry GenAI agent-observability semantic-convention engagement: adopt the upstream `gen_ai.agent.handoff.*` handoff identity (`semantic-conventions-genai`) and contribute the pathology layer — a pending/blocking wait-for state and loop-termination (`gen_ai.agent.finish_reason`) semantics — with looptrip as the deterministic reference implementation.
- **Phase 8** — launch.

## Documentation

- **[Proof](docs/proof.md)** — Reproduce the $792.96 headline. Evidence that the fixture is real and reproducible.
- **[Usage](docs/usage.md)** — CLI and library API reference, adapters, and configuration.
- **[Architecture](docs/architecture.md)** — Detector design, event normalization, signature matching, and phase-by-phase roadmap.
- **[Adapters](docs/adapters.md)** — Implementing a custom adapter for your event source (OTel spans, custom JSON, etc.).
- **[Testing](docs/testing.md)** — Test structure, mutation sanity, fixture integrity, and independent re-derivation.
- **[Framing](docs/framing.md)** — Attribution, cost baselines, related work (Watchtower), and the role of standards.
- **[Case Studies](docs/cases/)** — Real runaways: `workflow-subagent` loops, deadlock scenarios, and non-termination traces.
- **[Contributing](CONTRIBUTING.md)** — How to contribute, issue triage, and development setup.

## License

Apache-2.0. See [LICENSE](LICENSE).
