Metadata-Version: 2.4
Name: rlbridge
Version: 0.1.1
Summary: Reinforcement Learning Bridge - connects LLMs to RL environments, automates language-based problem construction, and derives instructions for long-term goals without user supervision
Project-URL: Homepage, https://github.com/pdfosborne/RL-IP
Project-URL: Documentation, https://github.com/pdfosborne/RL-IP#readme
Project-URL: Repository, https://github.com/pdfosborne/RL-IP
Project-URL: Issues, https://github.com/pdfosborne/RL-IP/issues
Project-URL: Changelog, https://github.com/pdfosborne/RL-IP/releases
Author: Philip Osborne
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: ai-agent,claude,gymnasium,instruction-following,llm,mcp,reinforcement-learning,rl,rlbridge
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: fastapi>=0.110.0
Requires-Dist: gymnasium>=0.29.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.12.0
Requires-Dist: uvicorn[standard]>=0.29.0
Provides-Extra: dev
Requires-Dist: build>=1.2.0; extra == 'dev'
Requires-Dist: hatchling>=1.24.0; extra == 'dev'
Requires-Dist: mypy>=1.9.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Requires-Dist: twine>=5.0.0; extra == 'dev'
Provides-Extra: envs-all
Requires-Dist: ale-py>=0.8.0; extra == 'envs-all'
Requires-Dist: gymnasium[all]>=0.29.0; extra == 'envs-all'
Provides-Extra: envs-atari
Requires-Dist: ale-py>=0.8.0; extra == 'envs-atari'
Requires-Dist: gymnasium[atari]>=0.29.0; extra == 'envs-atari'
Provides-Extra: envs-box2d
Requires-Dist: gymnasium[box2d]>=0.29.0; extra == 'envs-box2d'
Provides-Extra: envs-classic
Requires-Dist: gymnasium[classic-control]>=0.29.0; extra == 'envs-classic'
Provides-Extra: envs-mujoco
Requires-Dist: gymnasium[mujoco]>=0.29.0; extra == 'envs-mujoco'
Provides-Extra: examples
Requires-Dist: gymnasium[classic-control]>=0.29.0; extra == 'examples'
Provides-Extra: openai-sdk
Requires-Dist: openai>=1.30.0; extra == 'openai-sdk'
Provides-Extra: sentence-transformers
Requires-Dist: sentence-transformers>=2.6.0; extra == 'sentence-transformers'
Provides-Extra: torch
Requires-Dist: torch>=2.0.0; extra == 'torch'
Description-Content-Type: text/markdown

# Reinforcement Learning Bridge (RL Bridge)

**Reinforcement Learning Bridge** connects LLMs to reinforcement learning environments.
It ships as an **MCP plugin** compatible with Claude Code, Claude Desktop, LM Studio, Cursor, Windsurf, Codex CLI, and OpenCode, and uniquely automates how LLMs construct RL problems in language - building environments, translating observations, matching goals to sub-goals, and training agents to complete instructions without user supervision.

---

## Architecture

```
┌──────────────────────────────────────────────────────┐
│  Claude Code                                         │
│                                                      │
│  "Run 100 steps of CartPole with a random policy"    │
│         │                                            │
│   MCP layer (stdio)                                  │
└──────────┼───────────────────────────────────────────┘
           │
┌──────────▼───────────────────────────────────────────┐
│  RL Bridge MCP Plugin  (rlip.mcp_plugin)             │
│  FastMCP tools:  rl_create · rl_reset · rl_step      │
│                 rl_render · rl_close · rl_run_episode│
│         │                                            │
│   In-process dispatcher                              │
└──────────┼───────────────────────────────────────────┘
           │           ╌╌ OR ╌╌ (RLIP_SERVER_URL)
┌──────────▼───────────────────────────────────────────┐
│  RL Bridge Server (HTTP/JSON-RPC 2.0)                │
│  POST /rpc  ·  GET /environments  ·  GET /health     │
└──────────┬───────────────────────────────────────────┘
           │
┌──────────▼───────────────────────────────────────────┐
│  Environment Registry + Session Manager              │
│  ┌───────────────┐  ┌───────────────┐                │
│  │ CartPole-v1   │  │ LunarLander   │  + any custom  │
│  │ (Gymnasium)   │  │ (Gymnasium)   │    env …       │
│  └───────────────┘  └───────────────┘                │
└──────────────────────────────────────────────────────┘
```

---

## Quick Start

### 1. Install

From PyPI (recommended):

```bash
pip install rlbridge

# Classic Gymnasium envs for examples (CartPole, etc.)
pip install "rlbridge[examples]"
```

With [uv](https://docs.astral.sh/uv/):

```bash
uv pip install rlbridge
# or, in a uv-managed project:
uv add rlbridge
uv add "rlbridge[examples]"
```

From source (development):

```bash
git clone https://github.com/pdfosborne/RL-IP
cd RL-IP
pip install -e ".[dev,examples]"
# or: uv sync --extra dev --extra examples
```

### 2. Add to your AI tool

Run the command for whichever tool(s) you use, then restart the client:

```bash
# Claude Code (CLI)  → ~/.claude.json
rlip install-claude

# Claude Desktop (GUI app)
#   macOS   → ~/Library/Application Support/Claude/claude_desktop_config.json
#   Windows → %APPDATA%\Claude\claude_desktop_config.json
#   Linux   → ~/.config/Claude/claude_desktop_config.json
rlip install-claude-desktop

# LM Studio  → ~/.lmstudio/mcp.json  (Linux)
#              ~/Library/Application Support/LM Studio/mcp.json  (macOS)
#              %APPDATA%\LM Studio\mcp.json  (Windows)
rlip install-lmstudio

# Cursor  → ~/.cursor/mcp.json
rlip install-cursor

# Windsurf  → ~/.codeium/windsurf/mcp_config.json
rlip install-windsurf

# Codex CLI  → ~/.codex/config.toml
rlip install-codex

# OpenCode   → ~/.config/opencode/config.json
rlip install-opencode
```

All commands accept `--use-script` (uses the `rlip-mcp` console script instead
of `python -m`) and `--config-path` to override the default config location.

### 3. Use it

Open Claude Code and ask:

> *"Run a CartPole episode with a random policy and show me the total reward."*

Claude will call `rl_create`, `rl_reset`, `rl_step` (in a loop), and `rl_close` automatically.

---

## Available MCP Tools

### Environment control

| Tool | Description |
|------|-------------|
| `rl_list_environments` | Browse all registered RL environments |
| `rl_create` | Create an environment instance |
| `rl_reset` | Reset an instance, get the initial observation |
| `rl_step` | Execute one action, get `(obs, reward, terminated, truncated, info)` |
| `rl_sample_action` | Sample a random valid action |
| `rl_spaces` | Inspect observation and action space details |
| `rl_render` | Render current state (PNG or ASCII) |
| `rl_close` | Destroy an instance |
| `rl_list_instances` | List all active instances |
| `rl_run_episode` | Run a complete episode in one call |

### Custom environment builder

Build and register new environments—wrapping any Gymnasium env—with custom
metadata, local caching, and language translation.

| Tool | Description |
|------|-------------|
| `rl_build_environment` | Wrap a Gymnasium env with custom ID, description, and tags; cache to `~/.rlip/envs/` and write to user catalog |
| `rl_list_cached_environments` | Browse previously built environments stored in the local cache |
| `rl_load_cached_environments` | Re-register all cached environments at session start |

**Example workflow in Claude Code:**

> *"Build me a FrozenLake environment called FrozenLake-Custom-v0 with tags grid and discrete."*

```
rl_build_environment("FrozenLake-Custom-v0", "FrozenLake-v1",
                     description="Custom FrozenLake.", tags="grid,discrete")
```

### Language translation

Map raw environment observations to natural-language descriptions.  Required
for instruction-following and sub-goal reward shaping.

| Tool | Description |
|------|-------------|
| `rl_sample_states_for_translation` | Randomly explore an environment and display raw observed states so a `translate()` function can be written |
| `rl_set_translator_code` | Compile, validate, and install a Python `translate()` function; saves it to `~/.rlip/envs/<env_id>/translator.py` so it reloads automatically |
| `rl_translate_state` | Test a single state → natural-language description round-trip |

**Example workflow in Claude Code:**

> *"Sample some states from FrozenLake-Custom-v0 so I can write a translator."*

```
rl_sample_states_for_translation("FrozenLake-Custom-v0", n_samples=20)
```

> *"Install this translator:"*

```python
def translate(state, *, legal_moves=None, action_history=None):
    row, col = divmod(int(state), 4)
    labels = {0: "start", 15: "goal"}
    return f"Agent at row {row}, column {col}. {labels.get(state, '')}"
```

```
rl_set_translator_code("FrozenLake-Custom-v0", python_code="...")
rl_translate_state("FrozenLake-Custom-v0", state="15")
```

The installed translator is automatically used by `rl_match_instruction` and
`rl_train_agent` with sub-goal shaping.

### Instruction-following

| Tool | Description |
|------|-------------|
| `rl_match_instruction` | Explore an environment, translate states to language, and find the state best matching a natural-language goal |
| `rl_instruction_run_episode` | Run a shaped episode where the matched state provides a bonus reward signal |

For semantic matching with Hugging Face sentence-transformers models:

```bash
pip install -e "."[sentence-transformers]
```

Then choose a model directly in the tool call:

```python
# Uses the default model for sentence-transformers
rl_match_instruction("Sailing-v0", "sail toward the beach", encoder="sentence-transformers")

# Uses a specific Hugging Face model id
rl_match_instruction(
    "Sailing-v0",
    "sail toward the beach",
    encoder="sentence",
    encoder_model="BAAI/bge-small-en-v1.5",
)
```

### RL agent training

| Tool | Description |
|------|-------------|
| `rl_list_agents` | See available agent types (`tabular_q`, `dqn`, `ppo`) with guidance on when to use each |
| `rl_train_agent` | Train an agent; optionally combine with a `match_id` for instruction-shaped rewards |
| `rl_run_agent_episode` | Evaluate a trained agent for one greedy episode |
| `rl_render_policy` | Render the best training episode as an animated GIF |

### Local LLM policy agent (Python API)

RL Bridge also includes a direct `local_llm` policy agent for action selection
without gradient training.

```python
from rlip.environments.registry import registry
from rlip.language_translation import get_translator
from rlip.rl_agents import LocalLLMAgent

env = registry.get("Sailing-v0").create()
translator = get_translator("Sailing-v0")

agent = LocalLLMAgent(base_url="http://localhost:11434/v1", model="llama3.1")

obs = env.reset().observation
obs_text = translator.translate(obs) if translator else str(obs)
action = agent.choose_action(obs_text, action_space=env.action_space)
step = env.step(action)
```

Use this agent primarily with language-translated observations. Calling a model
for every environment action can be expensive in long episodes, so account for
per-step latency and token cost.

---

## Full Example: New Environment from Scratch

```python
# 1. Build and cache the environment
rl_build_environment(
    env_id="FrozenLake-Custom-v0",
    gym_env_id="FrozenLake-v1",
    description="4×4 frozen lake grid-world.",
    tags="grid,discrete",
    namespace="custom",
)

# 2. Sample states to understand the observation format
rl_sample_states_for_translation("FrozenLake-Custom-v0", n_samples=16)
# → STATE 1: 0  STATE 2: 1  STATE 3: 5  …

# 3. Install a translator
rl_set_translator_code("FrozenLake-Custom-v0", python_code="""
def translate(state, *, legal_moves=None, action_history=None):
    row, col = divmod(int(state), 4)
    cell = {0: "start (S)", 5: "hole (H)", 10: "hole (H)", 15: "goal (G)"}.get(state, "frozen (F)")
    return f"Agent is at row {row}, column {col} - {cell}."
""")

# 4. Match an instruction to a goal state
rl_match_instruction("FrozenLake-Custom-v0", "reach the goal")
# → match_id: abc123

# 5. Train an agent with sub-goal shaping
rl_train_agent("dqn", "FrozenLake-Custom-v0", n_episodes=500, match_id="abc123")
# → agent_id: def456

# 6. Render the result
rl_render_policy("FrozenLake-Custom-v0", agent_id="def456")
```

---

## Standalone HTTP Server

Run RL Bridge as a standalone service (useful for multi-process workflows or
connecting non-Python agents):

```bash
rlip server --port 8765
```

Then send JSON-RPC requests:

```bash
curl -X POST http://localhost:8765/rpc \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": "1",
    "method": "rlip/environment/create",
    "params": {"env_id": "CartPole-v1"}
  }'
```

Browse the auto-generated API docs at `http://localhost:8765/docs`.

---

## Remote RL Bridge Server from the MCP Plugin

```bash
# Point the plugin at a remote server instead of running envs in-process
RLIP_SERVER_URL=http://my-gpu-machine:8765 rlip mcp
```

---

## Custom Environments (Python API)

For programmatic use, the `EnvironmentBuilder` Python API mirrors the MCP tools:

```python
from rlip.environments.builder import EnvironmentBuilder, load_cached_environments

# Build, cache, and register in one call
built = (
    EnvironmentBuilder("FrozenLake-Custom-v0")
    .from_gymnasium("FrozenLake-v1")
    .with_metadata(description="Custom FrozenLake.", tags=["grid", "discrete"])
    .build()   # auto_register=True, update_catalog=True by default
)

# Attach a hand-written translator
built.translator = ...   # any LanguageTranslator instance
built.register(register_translator=True)

# Reload all cached environments at startup
load_cached_environments()
```

For a fully custom (non-Gymnasium) environment, subclass the ABCs directly:

```python
from rlip.environments.base import RLIPEnvironment, RLIPEnvironmentFactory
from rlip.environments.registry import registry
from rlip.protocol.messages import DiscreteSpace, EnvironmentInfo, ResetResult, StepResult, RenderResult

class MyEnv(RLIPEnvironment):
    def reset(self, seed=None, options=None) -> ResetResult: ...
    def step(self, action) -> StepResult: ...
    def close(self): ...
    @property
    def observation_space(self): return DiscreteSpace(n=10)
    @property
    def action_space(self): return DiscreteSpace(n=4)
    def render(self) -> RenderResult: ...

class MyFactory(RLIPEnvironmentFactory):
    @property
    def env_info(self) -> EnvironmentInfo:
        return EnvironmentInfo(env_id="MyEnv-v0", description="My custom env", namespace="custom")
    def create(self, render_mode=None, **kwargs) -> MyEnv:
        return MyEnv()

registry.register(MyFactory())
```

---

## Third-Party Environment Plugins

RL Bridge can load additional environments from separately installed pip packages.
Packages register factories via the ``rlip.environments`` entry-point group and
optional MCP tools via ``rlip.environment_mcp_tools``.

Example: [Flesh and Blood](https://fabtcg.com/) TCG environments live in the
[`flesh-and-blood-rlip`](https://github.com/pdfosborne/flesh-and-blood-rlip) package (not bundled with RL Bridge):

```bash
# Install RL Bridge from PyPI
pip install rlbridge

# Then install the FaB plugin from GitHub
pip install git+https://github.com/pdfosborne/flesh-and-blood-rlip.git
```

After installation, environments appear in the registry automatically:

```python
from rlip.environments.registry import registry

env = registry.create("FleshAndBlood-Talishar-v0", format="silver_age")
```

To publish your own plugin, add to ``pyproject.toml``:

```toml
[project.entry-points."rlip.environments"]
my-env = "my_package:register_environments"

[project.entry-points."rlip.environment_mcp_tools"]
my-env = "my_package:register_mcp_tools"
```

Each callable receives ``registry=`` (environments) or
``mcp=``, ``registry=``, ``log=`` (MCP tools) and returns the number of
items registered.

---

## Language Translation (Python API)

```python
from rlip.language_translation.generator import TranslatorGenerator, build_translator

def my_llm(prompt: str) -> str:
    ...  # wrap any LLM provider

# Option A: two-stage LLM pipeline (describe → synthesise rules)
translator = build_translator(
    env, llm_fn=my_llm,
    env_context="4×4 grid, state = integer 0–15.",
    n_samples=20,
)

# Option B: hand-written rules with LLM fallback
from rlip.language_translation.generator import GeneratedTranslator

translator = GeneratedTranslator(llm_fn=my_llm, env_id="MyEnv-v0")
translator.rule_code = """
def translate(state, *, legal_moves=None, action_history=None):
    row, col = divmod(int(state), 4)
    return f"row {row}, col {col}"
"""

# Save as a standalone module (no LLM dependency at runtime)
translator.save_code("my_env_translator.py")
```

Generated translators fall back to live LLM calls for states the rule function
cannot handle, cache those answers, and auto-refine the rules once the cache
grows past `refine_threshold`.

---

## Proxy Mode (MCP ↔ Remote RL Bridge)

```
┌─────────────┐   stdio/MCP   ┌───────────────────┐   HTTP/JSON-RPC   ┌──────────────┐
│ Claude Code │ ────────────► │  RL Bridge Plugin │ ────────────────► │ RL Bridge    │
│             │               │ (thin proxy)      │                   │ Server       │
└─────────────┘               └───────────────────┘                   └──────────────┘
```

Set `RLIP_SERVER_URL` to enable proxy mode.

---

## Protocol

See [docs/protocol_spec.md](docs/protocol_spec.md) for the full JSON-RPC protocol
specification, including all method signatures, space encodings, error codes,
and the session lifecycle diagram.

---

## Project Layout

```
src/rlip/
├── __init__.py
├── __main__.py              # CLI (rlip server / rlip install-claude / rlip install-claude-desktop / rlip install-lmstudio / rlip install-cursor / rlip install-windsurf / rlip install-codex / rlip install-opencode / …)
├── protocol/
│   ├── constants.py         # Method names, error codes
│   └── messages.py          # Pydantic message models
├── environments/
│   ├── base.py              # RLIPEnvironment / RLIPEnvironmentFactory ABCs
│   ├── builder.py           # EnvironmentBuilder - fluent API for custom envs
│   ├── gymnasium_adapter.py # Gymnasium wrapper
│   ├── registry.py          # EnvironmentRegistry singleton
│   └── utils.py             # NumPy ↔ JSON helpers, space serialisation
├── instruction_matching/
│   ├── base.py              # BaseEncoder ABC
│   ├── tfidf.py             # TFIDFEncoder (default)
│   ├── bm25.py              # BM25Encoder
│   └── sentence_transformer.py  # SentenceEncoder (optional dep)
├── language_translation/
│   ├── base.py              # LanguageTranslator ABC
│   ├── caching.py           # CachingTranslator + translation cache
│   ├── generator.py         # TranslatorGenerator / GeneratedTranslator (LLM)
│   └── sailing.py           # Built-in Sailing translator
├── server/
│   ├── dispatcher.py        # Transport-agnostic JSON-RPC dispatcher
│   ├── rlip_server.py       # FastAPI HTTP server
│   ├── session.py           # Session / instance manager
│   └── exceptions.py        # RLIPError
├── transport/
│   ├── stdio_transport.py   # Stdio (newline-delimited JSON-RPC)
│   └── http_client.py       # Sync + async HTTP clients
└── mcp_plugin/
    └── plugin.py            # FastMCP MCP plugin for Claude Code
examples/
├── cartpole_http.py         # HTTP client episode example
├── custom_env.py            # Registering a custom environment
└── in_process_usage.py      # Using RL Bridge without a server
docs/
└── protocol_spec.md         # Full JSON-RPC protocol specification
```

### Console scripts

| Command | Entry point | Purpose |
|---------|-------------|---------|
| `rlip` | `rlip.__main__:app` | CLI (server, MCP, install-*, catalog, …) |
| `rlip-mcp` | `rlip.mcp_plugin.plugin:main` | MCP stdio plugin for AI tools |
| `rlip-server` | `rlip.__main__:server_app` | Standalone HTTP JSON-RPC server |

### Optional dependency extras

| Extra | Install | Use case |
|-------|---------|----------|
| `examples` / `envs-classic` | `pip install "rlbridge[examples]"` | CartPole and other classic-control envs |
| `envs-box2d` | `pip install "rlbridge[envs-box2d]"` | LunarLander, etc. |
| `envs-atari` | `pip install "rlbridge[envs-atari]"` | Atari games |
| `envs-mujoco` | `pip install "rlbridge[envs-mujoco]"` | MuJoCo envs |
| `envs-all` | `pip install "rlbridge[envs-all]"` | All Gymnasium env groups |
| `torch` | `pip install "rlbridge[torch]"` | DQN / PPO training agents |
| `sentence-transformers` | `pip install "rlbridge[sentence-transformers]"` | Semantic instruction matching |
| `openai-sdk` | `pip install "rlbridge[openai-sdk]"` | Official OpenAI client (optional) |
| `dev` | `pip install "rlbridge[dev]"` | pytest, ruff, mypy, build, twine |

---

## Publishing (maintainers)

Build and upload to PyPI (test first on [TestPyPI](https://test.pypi.org/)):

```bash
# Install build tools
pip install "rlbridge[dev]"
# or: uv sync --extra dev

# Bump version in pyproject.toml (single source of truth; __version__ reads it at runtime)

# Build sdist + wheel
python -m build
# or: uv build

# Upload (requires PyPI account + API token)
twine upload dist/*
# or: uv publish
```

User install after release:

```bash
pip install rlbridge
uv add rlbridge
```

**Note:** The PyPI distribution name is `rlbridge`. The Python import package is `rlip`. Console commands remain `rlip`, `rlip-mcp`, and `rlip-server` after install.

**Manual steps before first release:** create a PyPI account, generate an API token, and tag releases on GitHub.

---

## License

Apache License 2.0 - see [LICENSE](LICENSE).

The framework used in this work is currently patent pending with the US Patent and Trademark Office (18/955718). 


### Cite

Please use the following to cite this work

```bibtex
@phdthesis{OsborneThesis2024,
  title        = {Improving Real-World Reinforcement Learning by Self Completing Human Instructions on Rule Defined Language},  
  author       = {Philip Osborne},  
  year         = 2024,  
  month        = {August},  
  address      = {Manchester, UK},  
  note         = {Available at \url{https://research.manchester.ac.uk/en/studentTheses/improving-real-world-reinforcement-learning-by-self-completing-hu}},  
  school       = {The University of Manchester},  
  type         = {PhD thesis}
}

```