Metadata-Version: 2.4
Name: rlm-code
Version: 0.1.8
Summary: RLM Code: Research Playground & Evaluation OS for Recursive Language Model Agentic Systems
Project-URL: Homepage, https://github.com/SuperagenticAI/rlm-code
Project-URL: Documentation, https://superagenticai.github.io/rlm-code/
Project-URL: Repository, https://github.com/SuperagenticAI/rlm-code
Project-URL: Bug Tracker, https://github.com/SuperagenticAI/rlm-code/issues
Project-URL: Changelog, https://github.com/SuperagenticAI/rlm-code/blob/main/CHANGELOG.md
Author-email: Shashi Jagtap <shashi@super-agentic.ai>
Maintainer-email: Shashi Jagtap <shashi@super-agentic.ai>
License-Expression: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: ai,claude,code,dspy,interactive,language-models,nlp
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.11
Requires-Dist: anyio
Requires-Dist: click
Requires-Dist: dspy
Requires-Dist: httpx
Requires-Dist: httpx-sse
Requires-Dist: jsonschema
Requires-Dist: mcp
Requires-Dist: packaging
Requires-Dist: pydantic
Requires-Dist: pyyaml
Requires-Dist: requests
Requires-Dist: rich
Provides-Extra: adk
Requires-Dist: google-adk; extra == 'adk'
Requires-Dist: google-genai; extra == 'adk'
Requires-Dist: python-dotenv; extra == 'adk'
Provides-Extra: anthropic
Requires-Dist: anthropic; extra == 'anthropic'
Provides-Extra: deepagents
Requires-Dist: deepagents; extra == 'deepagents'
Provides-Extra: dev
Requires-Dist: hypothesis; extra == 'dev'
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pre-commit; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-asyncio; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest-xdist; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Requires-Dist: types-pyyaml; extra == 'dev'
Requires-Dist: types-requests; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs; extra == 'docs'
Requires-Dist: mkdocs-material; extra == 'docs'
Requires-Dist: mkdocs-minify-plugin; extra == 'docs'
Requires-Dist: mkdocstrings[python]; extra == 'docs'
Provides-Extra: frameworks
Requires-Dist: deepagents; extra == 'frameworks'
Requires-Dist: google-adk; extra == 'frameworks'
Requires-Dist: google-genai; extra == 'frameworks'
Requires-Dist: pydantic-ai; extra == 'frameworks'
Requires-Dist: python-dotenv; extra == 'frameworks'
Provides-Extra: gemini
Requires-Dist: google-genai; extra == 'gemini'
Provides-Extra: llm-all
Requires-Dist: anthropic; extra == 'llm-all'
Requires-Dist: google-genai; extra == 'llm-all'
Requires-Dist: openai; extra == 'llm-all'
Provides-Extra: mcp-ws
Requires-Dist: websockets; extra == 'mcp-ws'
Provides-Extra: mlflow
Requires-Dist: mlflow; extra == 'mlflow'
Provides-Extra: openai
Requires-Dist: openai; extra == 'openai'
Provides-Extra: pydantic
Requires-Dist: pydantic-ai; extra == 'pydantic'
Provides-Extra: test
Requires-Dist: hypothesis; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-asyncio; extra == 'test'
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: pytest-xdist; extra == 'test'
Provides-Extra: tui
Requires-Dist: textual; extra == 'tui'
Description-Content-Type: text/markdown

# RLM Code

<p align="center">
  <a href="https://github.com/SuperagenticAI/rlm-code">
    <img src="https://github.com/SuperagenticAI/rlm-code/raw/main/assets/rlm-code-logo.png" alt="RLM Code logo" width="320">
  </a>
</p>

[![PyPI Version](https://img.shields.io/pypi/v/rlm-code.svg)](https://pypi.org/project/rlm-code/)
[![Python Versions](https://img.shields.io/pypi/pyversions/rlm-code.svg)](https://pypi.org/project/rlm-code/)
[![PyPI Wheel](https://img.shields.io/pypi/wheel/rlm-code.svg)](https://pypi.org/project/rlm-code/)
[![License](https://img.shields.io/pypi/l/rlm-code.svg)](https://pypi.org/project/rlm-code/)
[![CI](https://github.com/SuperagenticAI/rlm-code/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/SuperagenticAI/rlm-code/actions/workflows/ci.yml)
[![Pre-commit](https://github.com/SuperagenticAI/rlm-code/actions/workflows/pre-commit.yml/badge.svg?branch=main)](https://github.com/SuperagenticAI/rlm-code/actions/workflows/pre-commit.yml)
[![Docs Deploy](https://github.com/SuperagenticAI/rlm-code/actions/workflows/deploy-docs.yml/badge.svg?branch=main)](https://github.com/SuperagenticAI/rlm-code/actions/workflows/deploy-docs.yml)
[![Release](https://github.com/SuperagenticAI/rlm-code/actions/workflows/release.yml/badge.svg?branch=main)](https://github.com/SuperagenticAI/rlm-code/actions/workflows/release.yml)
[![Docs](https://img.shields.io/badge/Docs-RLM%20Code-ff7a18.svg?logo=readthedocs&logoColor=white)](https://superagenticai.github.io/rlm-code/)
[![GitHub Stars](https://img.shields.io/github/stars/SuperagenticAI/rlm-code.svg)](https://github.com/SuperagenticAI/rlm-code/stargazers)
[![GitHub Issues](https://img.shields.io/github/issues/SuperagenticAI/rlm-code.svg)](https://github.com/SuperagenticAI/rlm-code/issues)
[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/SuperagenticAI/rlm-code.svg)](https://github.com/SuperagenticAI/rlm-code/pulls)

**Run LLM-powered agents in a REPL loop, benchmark them, and compare results.**

RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.07503) (RLM) approach from the 2025 paper release. Instead of stuffing your entire document into the LLM's context window, RLM stores it as a Python variable and lets the LLM write code to analyze it, chunk by chunk, iteration by iteration. This is dramatically more token-efficient for large inputs.

RLM Code wraps this algorithm in an interactive terminal UI with built-in benchmarks, trajectory replay, and observability.

## Release v0.1.8

This release extends HALO/AHE-style trace analysis with layered evidence export.

- New `trace_analysis` environment for diagnosing agent harness failures from OTel-shaped JSONL traces
- Sidecar trace indexing with dataset overview, query, count, search, full-trace view, and selected-span view actions
- AHE-style evidence corpus export with `overview.md`, per-trace detail reports, `index.json`, and optional processed raw JSONL spans
- Bounded payload handling for large traces, including oversized summaries and higher-cap surgical span reads
- `/rlm` help/docs updated for `env=trace_analysis`
- Dedicated trace analysis docs under the Core Engine section

Example:

```text
/rlm run "Find systemic harness failures trace=./traces.jsonl" env=trace_analysis steps=6
```

## Documentation

<p align="center">
  <a href="https://superagenticai.github.io/rlm-code/">
    <img alt="Read the RLM Code Docs" src="https://img.shields.io/badge/Read%20the%20Docs-RLM%20Code-ff7a18?style=for-the-badge&logo=readthedocs&logoColor=white">
  </a>
</p>

<p align="center">
  <a href="https://superagenticai.github.io/rlm-code/"><strong>Open the full documentation</strong></a>
</p>

## Install

```bash
uv tool install "rlm-code[tui,llm-all]"
```

This installs `rlm-code` as a globally available command with its own isolated environment. You get the TUI and all LLM provider clients (OpenAI, Anthropic, Gemini).

Requirements:

- Python 3.11+
- `uv` (recommended) or `pip`
- one model route (BYOK API key or local server like Ollama)
- one secure execution backend (Docker recommended; Monty optional)

Don't have uv? Install it first:

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

<details>
<summary>Alternative: install with pip</summary>

```bash
pip install rlm-code[tui,llm-all]
```
</details>

<p align="center">
  <img src="https://github.com/SuperagenticAI/rlm-code/raw/main/assets/rlm-lab.png" alt="RLM Research Lab view" width="980">
</p>

## Quick Start

### 1. Launch

```bash
mkdir -p ~/my-project && cd ~/my-project
rlm-code
```

This opens the terminal UI. You'll see a chat input at the bottom and tabs across the top.

### 2. Connect to an LLM

Type one of these in the chat input:

```
/connect anthropic claude-opus-4-6
```

or

```
/connect openai gpt-5.3-codex
```

or

```
/connect gemini gemini-2.5-flash
```

or for a free local model via [Ollama](https://ollama.com/):

```
/connect ollama llama3.2
```

> You need the matching API key in your environment (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GEMINI_API_KEY`) or in a `.env` file in your project directory. Ollama needs no key, just a running Ollama server.

Follow the interactive path with just `/connect` command instead: Check it worked:

```
/status
```

### 3. Run your first RLM task

```
/rlm run "Write a Python function that finds the longest common subsequence of two strings"
```

This starts the RLM loop: the LLM writes code in a sandboxed REPL, executes it, sees the output, writes more code, and iterates until it calls `FINAL(answer)` with the result.

### 4. Run a benchmark

Benchmarks let you measure how well a model performs on a set of tasks:

```
/rlm bench preset=pure_rlm_smoke
```

This runs 3 test cases through the RLM loop and scores the results.

See all available benchmarks:

```
/rlm bench list
```

### 5. View results

Use the **Research** tab (`Ctrl+5`) for live benchmark and trajectory views.
After at least two benchmark runs, export a compare report:

```
/rlm bench report candidate=latest baseline=previous format=markdown
```

### 6. Replay a session step-by-step

```
/rlm status
/rlm replay <run_id>
```

Walk through the last run one step at a time, see what code the LLM wrote, what output it got, and what it did next.

### 7. Use RLM Code as a coding agent (local/BYOK/ACP)

RLM Code can also be used as a coding-agent harness in the TUI, Just like Claude Code, Codex etc. It has mimimal harnesss to steer the model to write the code.

```text
/harness tools
/harness run "fix failing tests and add regression test" steps=8 mcp=on
```

ACP is supported too:

```text
/connect acp
/harness run "implement feature X with tests" steps=8 mcp=on
```

Notes:

- In Local/BYOK connection modes, likely coding prompts in chat can auto-route to harness.
- In ACP mode, auto-routing is intentionally off; use `/harness run ...` explicitly.

### 8. CodeMode with UTCP and Cloudflare MCP

Use these server entries in your project `rlm_config.yaml`:

```yaml
mcp_servers:
  utcp-codemode:
    name: utcp-codemode
    description: "Local CodeMode MCP bridge"
    enabled: true
    auto_connect: false
    timeout_seconds: 30
    retry_attempts: 3
    transport:
      type: stdio
      command: npx
      args:
        - "@utcp/code-mode-mcp"

  cloudflare-codemode:
    name: cloudflare-codemode
    description: "Cloudflare MCP via remote bridge"
    enabled: true
    auto_connect: false
    timeout_seconds: 30
    retry_attempts: 3
    transport:
      type: stdio
      command: npx
      args:
        - "mcp-remote"
        - "https://mcp.cloudflare.com/mcp"
```

UTCP path (native CodeMode in current release):

```text
/mcp-connect utcp-codemode
/mcp-tools utcp-codemode
/harness run "analyze this repo, find TODO/FIXME, and create report.json" steps=3 mcp=on strategy=codemode mcp_server=utcp-codemode
```

Cloudflare path (recommended strategy today):

```text
/mcp-connect cloudflare-codemode
/mcp-tools cloudflare-codemode
/harness run "list available tools and run one safe read-only action, then summarize in 3 bullets" steps=3 mcp=on strategy=tool_call mcp_server=cloudflare-codemode
```

Notes:

- On first Cloudflare connect, `mcp-remote` may ask for interactive authentication.
- In this release, `strategy=codemode` expects the `search_tools` + `call_tool_chain` bridge contract.
- If a remote MCP server exposes a different tool contract, use `strategy=tool_call`.

## How the RLM Loop Works

Traditional LLM usage: paste your document into the prompt, ask a question, hope the model doesn't lose details in the middle.

RLM approach:

1. Your document is stored as a Python variable `context` in a REPL
2. The LLM writes code to process it (e.g., `len(context)`, `context[:5000]`, `context.split('\n')`)
3. The code runs, and the LLM sees the output
4. The LLM writes more code based on what it learned
5. Repeat until the LLM calls `FINAL("here is my answer")`

This means the LLM can handle documents much larger than its context window, because it reads them in chunks through code rather than all at once through the prompt.

## What This Is (and Is Not)

RLM Code is:

- a research playground for recursive/model-assisted coding workflows
- a benchmarking and replay tool for reproducible experiments

RLM Code is not:

- a no-config consumer chat app
- guaranteed cheap (recursive runs can be expensive)
- safe to run with unrestricted execution settings

Use secure backend defaults (`/sandbox profile secure`) for normal use.

## Key Commands

| Command | What it does |
|---------|-------------|
| `/connect <provider> <model>` | Connect to an LLM |
| `/model` | Interactive model picker |
| `/status` | Show connection status |
| `/sandbox profile secure` | Apply secure sandbox defaults (Docker-first + strict pure RLM) |
| `/rlm run "<task>"` | Run a task through the RLM loop |
| `/rlm bench preset=<name>` | Run a benchmark preset |
| `/rlm bench list` | List available benchmarks |
| `/rlm bench compare` | Compare latest benchmark run with previous run |
| `/rlm abort [run_id\|all]` | Cancel active run(s) cooperatively |
| `/harness run "<task>"` | Run tool-using coding harness loop |
| `/rlm replay` | Step through the last run |
| `/rlm chat "<question>"` | Ask the LLM a question about your project |
| `/help` | Show all available commands |

## Cost and Safety Guardrails

Start bounded:

```text
/rlm run "small scoped task" steps=4 timeout=30 budget=60
```

For benchmarks, start with small limits:

```text
/rlm bench preset=dspy_quick limit=1
```

If a run is going out of hand:

```text
/rlm abort all
```

## What You Can Do With It

- **Analyze large documents**: Feed in a 500-page PDF and ask questions, then the LLM reads it in chunks via code
- **Compare models**: Run the same benchmark with different providers and see who scores higher
- **Compare paradigms**: Test Pure RLM vs CodeAct vs Traditional approaches on the same task
- **Debug agent behavior**: Replay any run step-by-step to see exactly what the agent did
- **Track experiments**: Every run is logged with metrics, tokens used, and trajectory

## Supported LLM Providers

| Provider | Latest Models | Setup |
|----------|--------------|-------|
| **Anthropic** | `claude-opus-4-6`, `claude-sonnet-4-5-20250929` | `ANTHROPIC_API_KEY` env var |
| **OpenAI** | `gpt-5.3-codex`, `gpt-5.2-pro` | `OPENAI_API_KEY` env var |
| **Google** | `gemini-2.5-pro`, `gemini-2.5-flash` | `GEMINI_API_KEY` or `GOOGLE_API_KEY` env var |
| **Ollama** | `llama3.2`, `qwen2.5-coder:7b` | Running Ollama server at `localhost:11434` |

## Configuration

Create an `rlm_config.yaml` in your project directory to customize settings:

```yaml
name: my-project

models:
  openai_api_key: null
  openai_model: gpt-5.3-codex

default_model: gpt-5.3-codex

sandbox:
  runtime: docker
  superbox_profile: secure
  superbox_auto_fallback: true
  superbox_fallback_runtimes: [docker, daytona, e2b]
  pure_rlm_backend: docker
  pure_rlm_strict: true
  pure_rlm_allow_unsafe_exec: false

rlm:
  default_benchmark_preset: dspy_quick
  benchmark_pack_paths: []
```

Or generate a full sample config:

```
/init
```

## Development Setup

```bash
git clone https://github.com/SuperagenticAI/rlm-code.git
cd rlm-code
uv sync --all-extras
uv run pytest
```

## Project Structure

```
rlm_code/
  rlm/              # Core RLM engine (runner, environments, policies)
  ui/               # Terminal UI (Textual-based TUI)
  mcp/              # MCP server for tool integration
  models/           # LLM provider adapters
  sandbox/          # Sandboxed code execution
  harness/          # Tool-using coding harness (/harness)
```

## Resources

Full docs: https://superagenticai.github.io/rlm-code/

## Contributing

See `CONTRIBUTING.md`.

## License

Apache-2.0

---

**Brought to You by [Superagentic AI](https://super-agentic.ai)**
