Metadata-Version: 2.4
Name: reasonblocks
Version: 0.2.4
Summary: Drop-in agent observability + steering for LangChain, LangGraph, OpenAI Agents SDK, Claude Agent SDK, and the Anthropic Messages API.
Project-URL: Homepage, https://reasonblocks.com
Project-URL: Documentation, https://reasonblocks.mintlify.app/introduction
Project-URL: Repository, https://github.com/ReasonBlocks/sdk
Project-URL: Issues, https://github.com/ReasonBlocks/sdk/issues
Author-email: ReasonBlocks <rohan@reasonblocks.com>
License: MIT License
        
        Copyright (c) 2026 ReasonBlocks
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: agent-observability,agents,ai,anthropic,claude,langchain,langgraph,llm,monitoring,openai-agents
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27
Provides-Extra: all
Requires-Dist: anthropic>=0.39; extra == 'all'
Requires-Dist: browser-use>=0.12.6; extra == 'all'
Requires-Dist: claude-agent-sdk>=0.0.1; extra == 'all'
Requires-Dist: langchain-anthropic>=0.3; extra == 'all'
Requires-Dist: langchain>=1.0; extra == 'all'
Requires-Dist: openai-agents>=0.0.1; extra == 'all'
Provides-Extra: browser-use
Requires-Dist: anthropic>=0.39; extra == 'browser-use'
Requires-Dist: browser-use>=0.12.6; extra == 'browser-use'
Provides-Extra: claude
Requires-Dist: anthropic>=0.39; extra == 'claude'
Requires-Dist: claude-agent-sdk>=0.0.1; extra == 'claude'
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Provides-Extra: langchain
Requires-Dist: langchain-anthropic>=0.3; extra == 'langchain'
Requires-Dist: langchain>=1.0; extra == 'langchain'
Provides-Extra: openai-agents
Requires-Dist: openai-agents>=0.0.1; extra == 'openai-agents'
Description-Content-Type: text/markdown

# ReasonBlocks SDK

Python SDK that integrates ReasonBlocks into LangChain 1.0 agents as middleware. Scores each agent step, injects steering signals and E-traces, and optionally switches models based on difficulty.

## Installation

```bash
pip install reasonblocks
```

## Quick Start

```python
from reasonblocks import ReasonBlocks

rb = ReasonBlocks(api_key="rb_live_...")

agent = create_agent(
    model="anthropic:claude-sonnet-4-20250514",
    tools=[...],
    system_prompt="...",
    middleware=[rb.middleware()],
)
```

One import. One init. One middleware addition.

## Tagging runs for the dashboard

`rb.middleware()` accepts identifying metadata that lands on the dashboard runs row:

```python
agent = create_agent(
    ...,
    middleware=[rb.middleware(
        org_id="6d3f...",         # uuid; "default" if omitted
        project_id="a91b...",     # uuid; "default" if omitted
        run_id="my-run-1",        # auto-generated if omitted
        agent_name="bugfixer",    # free-form filter key
        task="fix the TypeError",
        model="claude-sonnet-4-20250514",
        framework="langchain",
        codebase_id="myrepo@sha:abc123",
    )],
)
```

The dashboard's **Quickstart** page (`/platform/dashboard/quickstart`) shows the user's actual `org_id` / `project_id` next to a copy-pasteable snippet — the easiest way to get the values without hardcoding.

## Configuration

```python
rb = ReasonBlocks(
    api_key="rb-...",
    token_budget=100_000,
    monitor_names=["loop", "confidence", "evidence", "budget", "strategy_exhaustion"],
    fsm_thresholds={
        "fast_threshold": 0.2,
        "slow_threshold": 0.6,
        "skip_threshold": 0.85,
    },
    model_routing={
        "FAST": "anthropic:claude-haiku-4-5-20251001",
        "SLOW": "anthropic:claude-sonnet-4-20250514",
    },
    e_traces_enabled=True,
)
```

## Validated modes

Presets that wire one of the configurations measured on a real benchmark.
Each mode is a one-liner that mounts the exact middleware stack that
produced the published number — same thresholds, same rule pack, same
priority order.

### `mode="code_review"`

The SWE-bench Pro **D-arm** stack — code-review reactive monitor (7
rules), tool-output compression (head+tail at 1800 chars, keep most-recent
2), early-exit nudge.

**Validated headline** (paired n=75, `claude-sonnet-4-6`, real Docker
grading; see [`swebench-pro-bench/results/compare_a_cv1_d.json`](../swebench-pro-bench/results/compare_a_cv1_d.json)):

| arm | pass rate | mean input tokens | vs baseline |
|---|---|---|---|
| baseline | 25.3% | 1,257,316 | — |
| `mode="code_review"` (D arm) | 25.4% | **606,212** | **−51.8% tokens, flat accuracy** |
| (alt) `enable_general_monitor=True` (C_v1) | **36.0%** | 1,136,946 | +10.7pp accuracy, −9.6% tokens |

Use `mode="code_review"` when you want the maximum cost cut at unchanged
success; use the C_v1 monitor when you want the accuracy lift.

**One-liner install** — drop into any LangChain agent:

```python
from reasonblocks import for_code_review

agent = create_agent(
    model="claude-sonnet-4-6",
    tools=[bash_tool, ...],
    middleware=for_code_review(
        fail_to_pass_tests=task.fail_to_pass,  # SWE-bench Pro metadata
        max_tool_calls=50,                     # the budget the monitor reasons about
    ),
)
```

**Or via the unified config** (composes with E-traces, routing, etc.):

```python
from reasonblocks import ReasonBlocksConfig, build_middleware

cfg = ReasonBlocksConfig.from_mode(
    "code_review",
    cr_fail_to_pass_tests=task.fail_to_pass,
    cr_max_tool_calls=50,
)
middleware = build_middleware(cfg)
```

**What the stack does on each step**:

1. Detects the v1 failure patterns (semantic loop, repeated error, edits
   without test, edits inside `site-packages`, missed FAIL_TO_PASS tests,
   half-budget no-edits, final-10% over-editing) and injects a short
   corrective hint when one fires.
2. Compresses any `ToolMessage` whose content exceeds 1800 characters
   using head+tail truncation; leaves the most-recent 2 tool messages
   untouched so the agent keeps full visibility into the step it's
   actively reasoning about.
3. Once past call index 40, if the monitor signals overthinking
   (rules 1/3/4), prepends a "stop investigating, submit your best
   answer" nudge.

Bash-tool name autodetection covers `bash`, `shell`, `run_command`,
`execute`, and `run_bash` out of the box; pass
`bash_tool_names=("your_tool",)` to override.

## Architecture

The middleware hooks into two points in the LangChain agent loop:

- **before_model** -- scores the agent's last reasoning step, updates the FSM state, runs all monitors, retrieves E-traces, and injects steering signals as a system message.
- **wrap_model_call** -- overrides the model based on FSM state (if routing is configured) and tracks token usage.

## Development

```bash
pip install -e ".[dev]"
pytest
```
