Metadata-Version: 2.4
Name: code-mode
Version: 0.0.1
Summary: Code execution mode for UTCP - enables executing Python code chains with tool access.
Author: UTCP Contributors
License-Expression: MPL-2.0
Project-URL: Homepage, https://utcp.io
Project-URL: Source, https://github.com/universal-tool-calling-protocol/python-utcp
Project-URL: Issues, https://github.com/universal-tool-calling-protocol/python-utcp/issues
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.0
Requires-Dist: utcp>=1.0
Requires-Dist: typing-extensions>=4.0
Requires-Dist: RestrictedPython>=6.0
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: coverage; extra == "dev"
Requires-Dist: twine; extra == "dev"

<div align="center">

<h1 align="center">🐍 Python Code-Mode: First Python library for tool calls via code execution</h1>
<p align="center">
    <a href="https://github.com/universal-tool-calling-protocol">
        <img src="https://img.shields.io/github/followers/universal-tool-calling-protocol?label=Follow%20Org&logo=github" /></a>
    <a href="https://pypi.org/project/code-mode/" title="PyPI Version">
        <img src="https://img.shields.io/pypi/v/code-mode"/></a>
    <a href="https://github.com/universal-tool-calling-protocol/code-mode/blob/main/LICENSE" alt="License">
        <img src="https://img.shields.io/github/license/universal-tool-calling-protocol/code-mode" /></a>
    <a href="https://pypi.org/project/code-mode/" alt="PyPI Downloads">
        <img src="https://img.shields.io/pypi/dm/code-mode" /></a>
</p>
</div>

> Transform your AI agents from clunky tool callers into efficient code executors — in just 3 lines of Python.

## Why This Changes Everything

LLMs excel at writing code but struggle with tool calls. Instead of exposing hundreds of tools directly, give them ONE tool that executes Python code with access to your entire toolkit.

**Research from [Apple](https://machinelearning.apple.com/research/codeact), [Cloudflare](https://blog.cloudflare.com/code-mode/) and [Anthropic](https://www.anthropic.com/engineering/code-execution-with-mcp) proves:**
- **60% faster execution** than traditional tool calling
- **68% fewer tokens** consumed  
- **88% fewer API round trips**
- **98.7% reduction in context overhead** for complex workflows

## Benchmarks

Independent [Python benchmark study](https://github.com/imran31415/codemode_python_benchmark) validates the performance claims with **$9,536/year cost savings** at 1,000 scenarios/day:

| Scenario Complexity | Traditional | Code Mode | **Improvement** |
|---------------------|-------------|-----------|----------------|
| **Simple (2-3 tools)** | 3 iterations | 1 execution | **67% faster** |
| **Medium (4-7 tools)** | 8 iterations | 1 execution | **75% faster** |
| **Complex (8+ tools)** | 16 iterations | 1 execution | **88% faster** |

### **Why Code Mode Dominates:**

   **Batching Advantage** - Single code block replaces multiple API calls  
   **Cognitive Efficiency** - LLMs excel at code generation vs. tool orchestration  
   **Computational Efficiency** - No context re-processing between operations

**Real-world results:** Independent benchmarks demonstrate significant cost savings, with **$9,536/year savings** possible at enterprise scale (1,000 scenarios/day).

## Get Started in 3 Lines

```python
from utcp_code_mode import CodeModeUtcpClient

client = await CodeModeUtcpClient.create()                      # 1. Initialize
await client.register_manual({'name': 'github', ...})           # 2. Add tools  
result = await client.call_tool_chain('# Python code here')     # 3. Execute code
```

That's it. Your AI agent can now execute complex workflows in a single request instead of dozens.

## What You Get

### **Progressive Tool Discovery**
```python
# Agent discovers tools dynamically, loads only what it needs
tools = await client.search_tools('github pull request')
# Instead of 500 tool definitions → 3 relevant tools
```

### **Natural Code Execution**  
```python
result = await client.call_tool_chain('''
# Chain multiple operations in one request
pr = await github.get_pull_request(owner='microsoft', repo='vscode', pull_number=1234)
comments = await github.get_pull_request_comments(owner='microsoft', repo='vscode', pull_number=1234)
reviews = await github.get_pull_request_reviews(owner='microsoft', repo='vscode', pull_number=1234)

# Process data efficiently in-sandbox
summary = {
    "title": pr["title"],
    "comment_count": len(comments),
    "approvals": len([r for r in reviews if r["state"] == "APPROVED"])
}

print(f'PR "{pr["title"]}" analysis complete')
return summary
''')

print('Analysis Result:', result["result"])
# console output: 'PR "Fix memory leak in hooks" analysis complete'
```

### **Auto-Generated Python TypedDict Interfaces**
```python
class GetPullRequestInput(TypedDict):
    """Repository owner"""
    owner: str
    """Repository name""" 
    repo: str
    """Pull request number"""
    pull_number: int
```

## Enterprise-Ready

- **Secure Subprocess Isolation** – True process isolation prevents unauthorized access
- **Timeout Protection** – Configurable execution limits prevent runaway code  
- **Complete Observability** – Full console output capture and error handling
- **Zero External Dependencies** – Tools only accessible through registered UTCP/MCP servers
- **Runtime Introspection** – Dynamic interface discovery for adaptive workflows

## Universal Protocol Support

Works with **any tool ecosystem:**

| Protocol | Description | Usage |
|----------|-------------|-------|
| **MCP** | Model Context Protocol servers | `call_template_type: 'mcp'` |
| **HTTP** | REST APIs with auto-discovery | `call_template_type: 'http'` |  
| **File** | Local JSON/YAML configurations | `call_template_type: 'file'` |
| **CLI** | Command-line tool execution | `call_template_type: 'cli'` |

## Installation

```bash
pip install utcp-code-mode
```

## Even Easier: Ready-to-Use MCP Server

**Want Code Mode without any setup?** Use our plug-and-play MCP server with Claude Desktop or any MCP client:

```json
{
  "mcpServers": {
    "code-mode": {
      "command": "uvx",
      "args": ["utcp-code-mode-mcp"],
      "env": {
        "UTCP_CONFIG_FILE": "/path/to/your/.utcp_config.json"
      }
    }
  }
}
```

**That's it!** No installation, no Python knowledge required. The Code Mode MCP Server automatically:
- Downloads and runs the latest version via `uvx`
- Loads your tool configurations from JSON
- Provides code execution capabilities to Claude Desktop
- Gives you `call_tool_chain` as an MCP tool for Python execution

**Perfect for non-developers** who want Code Mode power in Claude Desktop!

## Direct Python Usage

### 1. **MCP Server Integration**
Connect to any Model Context Protocol server:

```python
from utcp_code_mode import CodeModeUtcpClient

client = await CodeModeUtcpClient.create()

# Connect to GitHub MCP server
await client.register_manual({
    'name': 'github',
    'call_template_type': 'mcp',
    'config': {
        'mcpServers': {
            'github': {
                'command': 'docker',
                'args': ['run', '-i', '--rm', '-e', 'GITHUB_PERSONAL_ACCESS_TOKEN', 'mcp/github'],
                'env': {'GITHUB_PERSONAL_ACCESS_TOKEN': os.environ.get('GITHUB_TOKEN')}
            }
        }
    }
})
```

### 2. **Execute Multi-Step Workflows**
Replace 15+ tool calls with a single code execution:

```python
result = await client.call_tool_chain('''
# Traditional: 4 separate API round trips → Code Mode: 1 execution
pr = await github.get_pull_request(owner='microsoft', repo='vscode', pull_number=1234)
comments = await github.get_pull_request_comments(owner='microsoft', repo='vscode', pull_number=1234)
reviews = await github.get_pull_request_reviews(owner='microsoft', repo='vscode', pull_number=1234)
files = await github.get_pull_request_files(owner='microsoft', repo='vscode', pull_number=1234)

# Process data in-sandbox (no token overhead)
summary = {
    "title": pr["title"],
    "state": pr["state"],
    "author": pr["user"]["login"],
    "stats": {
        "comments": len(comments),
        "reviews": len(reviews), 
        "files_changed": len(files),
        "approvals": len([r for r in reviews if r["state"] == "APPROVED"])
    },
    "top_discussion": [
        {
            "author": c["user"]["login"],
            "preview": c["body"][:100] + "..."
        }
        for c in comments[:3]
    ]
}

print(f'PR "{pr["title"]}" analysis complete')
return summary
''')

print('Analysis Result:', result["result"])
# console output: 'PR "Fix memory leak in hooks" analysis complete'
```

---

## Advanced Features

### **Multi-Protocol Tool Chains**
Mix and match different tool ecosystems in a single execution:

```python
# Register multiple tool sources
await client.register_manual({'name': 'github', 'call_template_type': 'mcp', ...})
await client.register_manual({'name': 'slack', 'call_template_type': 'http', ...})
await client.register_manual({'name': 'db', 'call_template_type': 'file', 'file_path': './db-tools.json'})  # Loads UTCP manual from JSON

result = await client.call_tool_chain('''
# Fetch PR data from GitHub (MCP)
pr = await github.get_pull_request(owner='company', repo='api', pull_number=42)

# Query deployment status from database (File)
deployment = await db.get_deployment_status(pr_id=pr["id"])

# Send notification to Slack (HTTP)
await slack.post_message(
    channel='#releases',
    text=f'PR #42 "{pr["title"]}" deployed to {deployment["environment"]}'
)

return {"pr": pr["title"], "environment": deployment["environment"]}
''')
```

### **Runtime Interface Introspection**
Tools can dynamically discover and adapt to available interfaces:

```python
result = await client.call_tool_chain('''
# Discover available tools at runtime
print('Available interfaces:', __interfaces)

# Get specific tool interface for validation
pr_interface = __get_tool_interface('github.get_pull_request')
print('PR tool expects:', pr_interface)

# Use interface info for dynamic workflows
has_slack_tools = 'namespace slack' in __interfaces
if has_slack_tools:
    await slack.post_message(channel='#dev', text='Analysis complete')

return {"tools_available": has_slack_tools}
''')
```

### **Context-Efficient Data Processing**
Process large datasets without bloating the model's context:

```python
result = await client.call_tool_chain('''
# Fetch large dataset
all_issues = await github.list_repository_issues(owner='facebook', repo='react')
print(f'Fetched {len(all_issues)} total issues')

# Process efficiently in-sandbox
critical_bugs = [
    {
        "number": issue["number"],
        "title": issue["title"],
        "author": issue["user"]["login"],
        "days_old": (datetime.now() - datetime.fromisoformat(issue["created_at"].replace('Z', '+00:00'))).days
    }
    for issue in all_issues
    if any(l["name"] == "bug" for l in issue.get("labels", []))
    and any(l["name"] == "high priority" for l in issue.get("labels", []))
]
critical_bugs.sort(key=lambda x: x["days_old"], reverse=True)

# Only return processed summary (not 10,000 raw issues)
return {
    "total_issues": len(all_issues),
    "critical_bugs": critical_bugs[:10],  # Top 10 oldest critical bugs
    "summary": f'Found {len(critical_bugs)} critical bugs, oldest is {critical_bugs[0]["days_old"]} days old'
}
''')
```

### **Error Handling & Observability**
Built-in error handling with complete execution transparency:

```python
result = await client.call_tool_chain('''
try:
    print('Starting multi-step workflow...')
    
    data = await external_api.fetch_data(id='user-123')
    print('Data fetched successfully')
    
    processed = await data_processor.transform(data)
    print(f'Processing completed with {len(processed.get("warnings", []))} warnings')
    
    return processed
except Exception as error:
    print(f'Workflow failed: {error}')
    raise error  # Propagates to outer error handling
''', timeout=30)  # 30-second timeout

# Complete observability
print('Result:', result["result"])
print('Execution logs:', result["logs"])
# ['Starting multi-step workflow...', 'Data fetched successfully', 'Processing completed with 2 warnings']
```

### **Custom Timeouts**
Configure execution limits for different workload types:

```python
# Quick operations (5 seconds)
quick_result = await client.call_tool_chain('return await ping.check()', timeout=5)

# Heavy data processing (2 minutes) 
heavy_result = await client.call_tool_chain('''
big_data = await database.export_full_dataset()
return await analytics.process_dataset(big_data)
''', timeout=120)
```

---

## AI Agent Integration

Plug-and-play with any AI framework. The built-in prompt template handles all the complexity:

```python
from utcp_code_mode import CodeModeUtcpClient
from openai import OpenAI

system_prompt = f"""
You are an AI assistant with access to tools via UTCP CodeMode.
{CodeModeUtcpClient.AGENT_PROMPT_TEMPLATE}
Additional instructions...
"""

# Works with any AI library
client = OpenAI()
response = client.chat.completions.create(
    model='gpt-4',
    messages=[
        {'role': 'system', 'content': system_prompt},
        {'role': 'user', 'content': 'Analyze the latest PR in microsoft/vscode'}
    ]
)
```

**The template provides comprehensive guidance on:**
- Tool discovery workflow (`search_tools` → `__interfaces` → `call_tool_chain`)
- Hierarchical access patterns (`manual.tool()` syntax)  
- Interface introspection (`__get_tool_interface()`)
- Error handling and best practices

---

## API Reference

### **Core Methods**

#### `call_tool_chain(code: str, timeout: int = 30)`
Execute Python code with full tool access and observability.
- **Returns**: `{"result": Any, "logs": List[str]}` with execution result and captured console output
- **Default timeout**: 30 seconds

#### `get_all_tools_python_interfaces()`
Generate complete Python TypedDict interfaces for IDE integration.
- **Returns**: String containing all interface definitions with proper typing

#### `search_tools(query: str)` *(from UtcpClient)*
Discover tools using natural language queries.
- **Returns**: List of relevant tools with descriptions and interfaces

### **Static Methods**

#### `CodeModeUtcpClient.create(root_dir=None, config=None)`
Create a new client instance with optional configuration.

#### `CodeModeUtcpClient.AGENT_PROMPT_TEMPLATE`
Production-ready prompt template for AI agents.

---

## Security & Performance

### **Secure by Design**
- **Subprocess isolation** – True process isolation with separate memory space
- **No filesystem access** – Tools only through registered servers  
- **Timeout protection** – Configurable execution limits with force termination
- **Restricted imports** – Only safe modules allowed (json, math, asyncio, datetime, etc.)
- **Safe builtins** – Dangerous functions like `exec`, `eval`, `open` blocked

### **Performance Optimized**
- **Minimal memory footprint** – Subprocess cleanup after execution
- **Efficient tool caching** – Python interfaces cached automatically
- **Streaming console output** – Real-time log capture without buffering
- **Identifier sanitization** – Handles invalid Python identifiers gracefully

---

## Security Model

The CodeModeUtcpClient implements a **cooperative sandbox** designed for LLM-generated code execution:

### **Security Features**

- **Subprocess Isolation**: Code executes in separate processes for true timeout and memory isolation
- **Restricted Imports**: Only safe modules (json, math, asyncio, datetime, time, re, etc.) can be imported
- **Safe Builtins**: Limited set of built-in functions, dangerous ones like `exec`, `eval`, `open` are blocked
- **No System Access**: Modules like `os`, `sys`, `subprocess` are not available in execution context
- **Real Timeouts**: Processes can be forcibly terminated if they exceed time limits
- **Error Isolation**: Exceptions and errors are properly contained and reported

### **Allowed Modules**

- `json` - JSON parsing and serialization
- `math` - Mathematical functions
- `asyncio` - Async/await support (required for tool calls)
- `datetime` - Date and time utilities
- `time` - Time functions
- `re` - Regular expressions
- `typing` - Type hints
- `collections`, `itertools`, `functools`, `operator`, `uuid` - Standard utilities

### **Blocked Features**

- **System modules**: `os`, `sys`, `subprocess`, `shutil`
- **File I/O**: `open`, file operations outside tools
- **Network**: `socket`, `urllib`, `requests`
- **Code execution**: `exec`, `eval`, `compile`
- **Introspection abuse**: Direct `__builtins__` manipulation

### **Use Case**

This security model is designed for:
- **Cooperative LLM-generated code** (not adversarial)
- **Tool-based workflows** where tools provide controlled external access  
- **Agent task execution** with defined interfaces
- **Educational and development environments**
- **Internal automation** with trusted code sources

**Not suitable for**: Production multi-tenant environments running untrusted user code.

---

## Development Experience

### **IDE Integration**
Generate Python type definitions for full IntelliSense support:

```python
# Generate tool interfaces  
interfaces = await client.get_all_tools_python_interfaces()
with open('generated_tools.py', 'w') as f:
    f.write(interfaces)

# Import in your code for type hints
from generated_tools import GithubNamespace
```

### **Debug & Monitor**
Built-in observability for production deployments:

```python
result = await client.call_tool_chain(user_code)

# Ship logs to your monitoring system
for log in result["logs"]:
    if log.startswith('[ERROR]'):
        monitoring.error(log)
    elif log.startswith('[WARN]'):
        monitoring.warn(log)
```

---

## Why Choose Code Mode UTCP?

| Traditional Tool Calling | **Code Mode UTCP** | **Improvement** |
|--------------------------|-------------------|----------------|
| 15+ API round trips | **1 code execution** | **15x fewer requests** |
| 50,000+ context tokens | **2,000 tokens** | **96% token reduction** |
| 16 iterations (complex) | **1 iteration** | **88% faster** |
| Higher token costs | **68% token reduction** | **$9,536/year savings** |
| Manual error handling | **Automatic capture & logs** | **Zero-config observability** |
| Tool-by-tool discovery | **Dynamic semantic search** | **Progressive disclosure** |
| Vendor/protocol lock-in | **Universal compatibility** | **MCP, HTTP, File, CLI** |

### **Benchmark Methodology**
The [comprehensive Python study](https://github.com/imran31415/codemode_python_benchmark) tested **16 realistic scenarios** across:
- **Financial workflows** (invoicing, expense tracking)  
- **DevOps operations** (deployments, monitoring)
- **Data processing** (analysis, reporting)
- **Business automation** (CRM, notifications)

**Models tested:** Claude Haiku, Gemini Flash  
**Pricing basis:** $0.25/1M input, $1.25/1M output tokens  
**Scale:** 1,000 scenarios/day = $9,536/year savings with Code Mode

## Learn More

- **[Cloudflare Research](https://blog.cloudflare.com/code-mode/)** – Original code mode whitepaper
- **[Anthropic Study](https://www.anthropic.com/engineering/code-execution-with-mcp)** – MCP code execution benefits
- **[Python Benchmark Study](https://github.com/imran31415/codemode_python_benchmark)** – Comprehensive performance analysis
- **[UTCP Specification](https://github.com/universal-tool-calling-protocol/python-utcp)** – Official Python implementation  
- **[Report Issues](https://github.com/universal-tool-calling-protocol/python-utcp/issues)** – Bug reports and feature requests

## License

**MPL-2.0** – Open source with commercial-friendly terms.
