Metadata-Version: 2.4
Name: mcp-rlm-proxy
Version: 0.1.1
Summary: MCP-RLM-Proxy: Intelligent MCP middleware with RLM-style recursive exploration, smart caching, and proxy tools.
Project-URL: Homepage, https://github.com/pratikjadhav2726/mcp-rlm-proxy
Project-URL: Documentation, https://github.com/pratikjadhav2726/mcp-rlm-proxy#readme
Project-URL: Repository, https://github.com/pratikjadhav2726/mcp-rlm-proxy
Project-URL: Issues, https://github.com/pratikjadhav2726/mcp-rlm-proxy/issues
Author: MCP-RLM-Proxy Contributors
License: MIT
License-File: LICENSE
Keywords: ai,llm,mcp,model-context-protocol,proxy
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: HTTP Servers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: mcp>=1.23.1
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Description-Content-Type: text/markdown

# MCP-RLM-Proxy: Intelligent Middleware for MCP Servers

> **Production-ready middleware** implementing Recursive Language Model principles ([arXiv:2512.24601](https://arxiv.org/abs/2512.24601)) for efficient multi-server management, automatic large-response handling, and first-class proxy tools for recursive data exploration. **100% compatible with the MCP specification** - works with any existing MCP server without modification.

## Quick Start for Current MCP Users

**Already using MCP servers?** Add this as middleware in 5 minutes:

```bash
# 1. Install
pip install mcp-rlm-proxy

# 2. Create a config in your working directory (mcp.json)
mcp-rlm-proxy --init-config ./mcp.json

# 3. Edit mcp.json to add your servers
$EDITOR ./mcp.json

# 2. Configure your existing servers
cat > mcp.json << EOF
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/your/path"]
    }
  }
}
EOF

# 3. Run the proxy
mcp-rlm-proxy --config ./mcp.json
```

**That's it!** Your servers now have automatic large-response handling and three powerful proxy tools for recursive exploration.

---

## Why Use This as Middleware?

### The Problem with Direct MCP Connections

When AI agents connect directly to MCP servers:
- **Token waste**: 85-95% of returned data is often unnecessary
- **Context pollution**: Irrelevant data dilutes important information
- **No multi-server aggregation**: Must connect to each server separately
- **Performance degradation**: Large responses slow everything down
- **Cost explosion**: Every unnecessary token costs money

### The Solution: Intelligent Middleware

```
+---------------+
|  MCP Client   |  (Claude Desktop, Cursor, Custom Client)
+-------+-------+
        | ONE connection
        v
+---------------+
| MCP-RLM       |  <-- THIS MIDDLEWARE
| Proxy         |  - Connects to N servers
|               |  - Auto-truncates large responses
|               |  - Caches + provides proxy_filter / proxy_search / proxy_explore
|               |  - Tracks token savings
+-------+-------+
        | Manages connections to your servers
    +---+----+--------+--------+
    v        v        v        v
+-----+  +-----+  +-----+  +-----+
| FS  |  | Git |  | API |  | DB  |  <-- Your existing servers
+-----+  +-----+  +-----+  +-----+      (NO changes needed!)
```

### Benefits

- **Zero Friction**: Works with existing MCP servers (no code changes)
- **Huge Token Savings**: 85-95% reduction typical
- **Multi-Server**: Aggregate tools from many servers through one interface
- **Clean Schemas**: No `_meta` injection; tool schemas are passed through unmodified
- **Agent-Friendly**: Three first-class proxy tools with flat, simple parameters. `proxy_filter` uses Python REPL for flexible programmatic transformations
- **Auto-Truncation**: Large responses automatically truncated + cached for follow-up
- **Multi-Agent Ready**: Per-agent cache isolation supports hundreds of concurrent agents
- **Production Ready**: Connection pooling, error handling, metrics, TTL-based caching, memory-aware eviction

---

## How It Works

### Architecture Overview

1. **Client connects to proxy** (instead of individual servers)
2. **Proxy connects to N servers** (configured in `mcp.json`)
3. **Tools are aggregated** with server prefixes (`filesystem_read_file`)
4. **Tool schemas pass through clean** - no modification, no `_meta` injection
5. **Large responses are auto-truncated** and cached with a `cache_id`
6. **Three proxy tools** let agents drill into cached data without re-executing

### The Proxy Tools

| Tool | Purpose | Key Parameters |
|------|---------|----------------|
| `proxy_filter` | Transform/filter using Python REPL | `cache_id`, `code` (required), `return_format` |
| `proxy_search` | Grep/BM25/fuzzy/context search on cached or fresh result | `cache_id`, `pattern`, `mode`, `max_results` |
| `proxy_explore` | Discover data structure without loading content | `cache_id`, `max_depth` |

All parameters are **flat, top-level, simple types** - no nested objects required. Each tool can work in two modes:

- **Cached mode**: pass `cache_id` from a previous truncated response
- **Fresh mode**: pass `tool` + `arguments` to call and filter in one step

### Typical Agent Workflow

```
Step 1: Agent calls filesystem_read_file(path="large-data.json")
        -> Response is 50,000 chars -> auto-truncated + cached
        -> Agent receives first 8,000 chars + cache_id="a1b2c3d4e5f6"

Step 2: Agent calls proxy_explore(cache_id="a1b2c3d4e5f6")
        -> Returns structure summary: types, field names, sizes, sample
        -> 200 tokens instead of 50,000

Step 3: Agent calls proxy_filter(cache_id="a1b2c3d4e5f6", code="[{k: item[k] for k in ['name', 'email']} for item in data]")
        -> Returns only projected fields using Python REPL
        -> 500 tokens instead of 50,000

Step 4: Agent calls proxy_search(cache_id="a1b2c3d4e5f6", pattern="error", mode="bm25", top_k=3)
        -> Returns top-3 most relevant chunks
        -> 800 tokens instead of 50,000

Total: ~1,500 tokens vs 50,000+ (97% savings!)
```

---

## Token Savings Impact

### Real-World Token Reduction Examples

| Use Case | Without Proxy | With Proxy | Savings | Cost Impact* |
|----------|---------------|------------|---------|--------------|
| **User Profile API** | 2,500 tokens | 150 tokens | **94%** | $0.10 -> $0.006 |
| **Log File Search** (1MB) | 280,000 tokens | 800 tokens | **99.7%** | Rate limited -> $0.32 |
| **Database Query** (100 rows) | 15,000 tokens | 1,200 tokens | **92%** | $0.60 -> $0.048 |
| **File System Scan** | 8,000 tokens | 400 tokens | **95%** | $0.32 -> $0.016 |

\* Estimated using GPT-4 pricing ($0.03/1K input tokens)

### Compound Savings in Multi-Step Workflows

For a typical AI agent workflow with 10 tool calls:
- **Without proxy**: 10 calls x 10,000 tokens avg = **100,000 tokens** -> $3.00
- **With proxy**: 10 calls x 800 tokens avg = **8,000 tokens** -> $0.24
- **Total savings per workflow**: **$2.76 (92% reduction)**

---

## Proxy Tool Reference

### proxy_filter

Transform/filter cached or fresh tool results using Python REPL. Execute Python code to programmatically transform data. The cached data is available as variable `data` in a sandboxed Python environment.

**Simple field projection:**
```json
{
  "cache_id": "a1b2c3d4e5f6",
  "code": "[{k: item[k] for k in ['name', 'email']} for item in data]"
}
```

**Complex filtering with conditions:**
```json
{
  "cache_id": "a1b2c3d4e5f6",
  "code": "[item for item in data if item.get('status') == 'active' and item.get('score', 0) > 80]"
}
```

**Aggregation:**
```json
{
  "cache_id": "a1b2c3d4e5f6",
  "code": "{'total': len(data), 'avg_score': sum(item.get('score', 0) for item in data) / len(data) if data else 0}"
}
```

**With return format:**
```json
{
  "cache_id": "a1b2c3d4e5f6",
  "code": "[{'name': item['name'], 'email': item['email']} for item in data]",
  "return_format": "json"
}
```

**With fresh call:**
```json
{
  "tool": "filesystem_read_file",
  "arguments": {"path": "data.json"},
  "code": "[item['name'] for item in data]"
}
```

### proxy_search

Search within a cached or fresh result. Modes: `regex`, `bm25`, `fuzzy`, `context`.

```json
{
  "cache_id": "a1b2c3d4e5f6",
  "pattern": "ERROR|FATAL",
  "mode": "regex",
  "case_insensitive": true,
  "max_results": 20,
  "context_lines": 2
}
```

BM25 relevance search:

```json
{
  "cache_id": "a1b2c3d4e5f6",
  "pattern": "database connection timeout",
  "mode": "bm25",
  "top_k": 5
}
```

### proxy_explore

Discover the structure of data without loading it all.

```json
{
  "cache_id": "a1b2c3d4e5f6",
  "max_depth": 3
}
```

Returns: types, field names, sizes, and a small sample.

---

## Multi-Agent Support

The proxy is designed to handle hundreds of concurrent agents efficiently through **per-agent cache isolation**.

### How It Works

When `enableAgentIsolation` is enabled (default), each agent gets:
- **Dedicated cache quota**: 20 entries and 100MB memory per agent (configurable)
- **Isolated cache space**: One agent's cache doesn't affect others
- **Smart eviction**: Large, idle, rarely-accessed entries evicted first
- **Automatic agent management**: LRU eviction of agent caches when max agents reached

### Benefits for Multi-Agent Scenarios

| Scenario | Without Isolation | With Isolation |
|----------|------------------|----------------|
| **100 agents, shared cache (50 entries)** | ~0.5 entries/agent, 10-20% hit rate | N/A |
| **100 agents, per-agent isolation (20 entries)** | N/A | 20 entries/agent, 70-80% hit rate |
| **Cache thrashing** | High (agents evict each other's entries) | None (isolated caches) |
| **Memory usage** | Unbounded risk | Predictable (~2GB for 100 agents) |
| **Performance** | Degrades with more agents | Consistent per agent |

### Configuration for Multi-Agent

```json
{
  "proxySettings": {
    "enableAgentIsolation": true,
    "maxEntriesPerAgent": 20,
    "maxMemoryPerAgent": 104857600,
    "maxTotalAgents": 1000,
    "cacheTTLSeconds": 600
  }
}
```

**Settings:**
- `enableAgentIsolation`: Enable per-agent cache isolation (recommended for 10+ agents)
- `maxEntriesPerAgent`: Maximum cache entries per agent (default: 20)
- `maxMemoryPerAgent`: Maximum memory per agent in bytes (default: 100MB)
- `maxTotalAgents`: Maximum concurrent agent caches (default: 1000)

### Cache ID Format

With agent isolation enabled, cache IDs are prefixed with the agent identifier:
- Format: `{agent_id}:{cache_id}`
- Example: `agent_1:abc123def456`
- The proxy automatically handles agent ID extraction and prefixing

### Backward Compatibility

If `enableAgentIsolation` is `false`, the proxy uses a shared cache (backward compatible with single-agent deployments).

---

## Configuration

### mcp.json

```json
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/data"]
    },
    "git": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-git", "/repo"]
    }
  },
  "proxySettings": {
    "maxResponseSize": 8000,
    "cacheMaxEntries": 50,
    "cacheTTLSeconds": 300,
    "enableAutoTruncation": true,
    "enableAgentIsolation": true,
    "maxEntriesPerAgent": 20,
    "maxMemoryPerAgent": 104857600,
    "maxTotalAgents": 1000
  }
}
```

### Proxy Settings

| Setting | Default | Description |
|---------|---------|-------------|
| `maxResponseSize` | 8000 | Character threshold for auto-truncation |
| `cacheMaxEntries` | 50 | Maximum cached responses (per agent if isolation enabled) |
| `cacheTTLSeconds` | 300 | Cache entry time-to-live (seconds) |
| `enableAutoTruncation` | true | Enable/disable auto-truncation + caching |
| `enableAgentIsolation` | true | Enable per-agent cache isolation (recommended for multi-agent) |
| `maxEntriesPerAgent` | 20 | Maximum cache entries per agent (when isolation enabled) |
| `maxMemoryPerAgent` | 104857600 | Maximum memory per agent in bytes (100MB default) |
| `maxTotalAgents` | 1000 | Maximum concurrent agent caches |

---

## Installation

```bash
pip install mcp-rlm-proxy

# For development:
# git clone https://github.com/pratikjadhav2726/mcp-rlm-proxy.git && cd mcp-rlm-proxy && uv sync
```

### Running the Proxy

```bash
mcp-rlm-proxy --config ./mcp.json
```

### Using with Claude Desktop

Edit your Claude Desktop config:

```json
{
  "mcpServers": {
    "proxy": {
      "command": "mcp-rlm-proxy",
      "args": ["--config", "/absolute/path/to/mcp.json"]
    }
  }
}
```

### Using Programmatically

```python
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

server_params = StdioServerParameters(
    command="mcp-rlm-proxy",
    args=["--config", "/absolute/path/to/mcp.json"]
)

async with stdio_client(server_params) as (read, write):
    async with ClientSession(read, write) as session:
        await session.initialize()

        # List tools (prefixed with server names + 3 proxy tools)
        tools = await session.list_tools()

        # Call a tool - if response is large, it's auto-truncated with cache_id
        result = await session.call_tool("filesystem_read_file", {
            "path": "large-data.json"
        })

        # Drill into the cached data
        filtered = await session.call_tool("proxy_filter", {
            "cache_id": "a1b2c3d4e5f6",
            "fields": ["users.name", "users.email"]
        })
```

---

## Legacy _meta Support

For backward compatibility, the `_meta` parameter is still accepted in tool arguments but is no longer advertised in schemas. If you pass `_meta.projection` or `_meta.grep`, the proxy will apply them. However, the recommended approach is to use the proxy tools instead:

| Old way (_meta) | New way (proxy tools) |
|-----------------|----------------------|
| Hidden in nested `_meta.projection` | `proxy_filter(fields=["name"])` |
| Hidden in nested `_meta.grep` | `proxy_search(pattern="ERROR")` |
| Not discoverable by agents | First-class tools visible in `list_tools()` |

---

## Search Modes

| Mode | Use When | Token Savings |
|------|----------|---------------|
| `structure` (proxy_explore) | Don't know data format | 99.9%+ |
| `bm25` | Know what, not where | 99%+ |
| `fuzzy` | Handle typos/variations | 98%+ |
| `context` | Need full paragraphs | 95%+ |
| `regex` | Know exact pattern | 95%+ |

---

## Performance Monitoring

Automatic tracking of token savings and performance:

```
INFO: Token savings: 50,000 -> 500 tokens (99.0% reduction)

=== Proxy Performance Summary ===
  Total calls: 127
  Projection calls: 45
  Grep calls: 23
  Auto-truncated: 15
  Original tokens: 2,450,000
  Filtered tokens: 125,000
  Tokens saved: 2,325,000
  Savings: 94.9%
  Active connections: 3
```

### Cache Statistics (Multi-Agent)

With agent isolation enabled, you can monitor per-agent cache usage:

```python
# Get aggregate cache statistics
stats = await proxy_server.cache.stats()
# Returns:
# {
#   "total_agents": 42,
#   "total_entries": 840,
#   "total_cached_bytes": 52428800,
#   "max_agents": 1000,
#   "max_entries_per_agent": 20,
#   "max_memory_per_agent": 104857600,
#   "agents": [
#     {
#       "agent_id": "agent_1",
#       "entries": 15,
#       "memory_bytes": 3145728,
#       "last_accessed_at": 1234567890.123
#     },
#     ...
#   ]
# }

# Get statistics for a specific agent
agent_stats = await proxy_server.cache.stats(agent_id="agent_1")
```

---

## Comparison with RLM Paper Concepts

| RLM Paper Concept | MCP-RLM-Proxy Implementation |
|-------------------|------------------------------|
| **External Environment** | Tool outputs treated as inspectable data stores |
| **Recursive Decomposition** | proxy_explore -> proxy_filter -> proxy_search workflow |
| **Programmatic Exploration** | proxy_search with multiple modes |
| **Snippet Processing** | Auto-truncation + cached follow-up |
| **Cost Efficiency** | 85-95% token reduction vs. full context loading |
| **Long Context Handling** | Processes multi-MB tool outputs without context limits |

---

## Documentation

- **[Architecture](docs/ARCHITECTURE.md)** - System design and data flow
- **[Configuration](docs/CONFIGURATION.md)** - Configuration options and validation
- **[Performance](docs/PERFORMANCE.md)** - Performance benchmarks and optimization

---

## Related Concepts

- **Recursive Language Models Paper**: [arXiv:2512.24601](https://arxiv.org/abs/2512.24601)
- **Model Context Protocol**: [MCP Specification](https://github.com/modelcontextprotocol)

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.

## License

MIT License - see [LICENSE](LICENSE)

---

**Built for the AI agent community**
