Metadata-Version: 2.4
Name: progtc
Version: 0.1.17
Summary: Programmatic tool calling for your agent.
Author-email: Callum Downie <70471360+calmdown13@users.noreply.github.com>
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: >=3.10
Requires-Dist: httpx-sse>=0.4.3
Requires-Dist: httpx>=0.28.1
Requires-Dist: pydantic>=2.12.5
Provides-Extra: server
Requires-Dist: fastapi>=0.123.10; extra == 'server'
Requires-Dist: rich>=14.2.0; extra == 'server'
Requires-Dist: typer>=0.20.0; extra == 'server'
Requires-Dist: uvicorn[standard]>=0.38.0; extra == 'server'
Provides-Extra: server-sentry
Requires-Dist: sentry-sdk>=2.52.0; extra == 'server-sentry'
Description-Content-Type: text/markdown

<div align="center">
<pre>
╔══════════════════════════════════════════════════════════╗
║   ██████╗ ██████╗  ██████╗  ██████╗ ████████╗ ██████╗    ║
║   ██╔══██╗██╔══██╗██╔═══██╗██╔════╝ ╚══██╔══╝██╔════╝    ║
║   ██████╔╝██████╔╝██║   ██║██║  ███╗   ██║   ██║         ║
║   ██╔═══╝ ██╔══██╗██║   ██║██║   ██║   ██║   ██║         ║
║   ██║     ██║  ██║╚██████╔╝╚██████╔╝   ██║   ╚██████╗    ║
║   ╚═╝     ╚═╝  ╚═╝ ╚═════╝  ╚═════╝    ╚═╝    ╚═════╝    ║
║                                           by capsa.ai    ║
╚══════════════════════════════════════════════════════════╝
</pre>

Programmatic tool calling for your agent.

[![CI](https://github.com/capsa-ai/progtc/actions/workflows/ci.yml/badge.svg)](https://github.com/capsa-ai/progtc/actions/workflows/ci.yml)
![PyPI - Version](https://img.shields.io/pypi/v/progtc?color=blue)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/progtc)

</div>

---

## What is Programmatic Tool Calling?

Programmatic Tool Calling is a strategy used to orchestrate an agent's tools through code rather than through individual API round-trips. Instead of your agent requesting tools one at a time with each result being returned to its context, your agent can write code that calls multiple tools, processes their outputs, and controls what information actually enters its context window.

Programmatic Tool Calling was popularised by the likes of smolagents and claude. `progtc` is a framework agnostic implementation.

The challenge that `progtc` solves is that, for security, your agent's code must be run in a sandboxed environment but typically your tools run locally. You therefore need a mechanism to communicate tool call requests and results to and from your sandbox.

## Installation

```bash
pip install progtc # client only
pip install "progtc[server]" # with server
```

Or with [uv](https://docs.astral.sh/uv/):

```bash
uv add progtc # client only
uv add "progtc[server]" # with server
```

## Quick Start

### 1. Start the Server (inside your sandbox)

```bash
progtc serve --host 0.0.0.0 --port 8000 --api-key your-secret-key
```

### 2. Execute Code from Your Client

```python
from progtc import AsyncProgtcClient

client = AsyncProgtcClient(
    base_url="https://your-sandbox-url:8000",
    api_key="your-secret-key",
)

# Define your tools as async functions
async def get_weather(city: str, country: str) -> str:
    # Your actual implementation
    return f"Weather in {city}, {country}: Sunny, 22°C"

async def search_database(query: str) -> list[dict]:
    # Your actual implementation
    return [{"id": 1, "name": "Result"}]

# Execute LLM-generated code that uses your tools
code = """
from tools import get_weather

weather = await get_weather("London", "UK")
print(f"The weather is: {weather}")
"""

result = await client.execute_code(
    code=code,
    tools={
        "get_weather": get_weather,
        "search_database": search_database,
    },
)

print(result.stdout)  # "The weather is: Weather in London, UK: Sunny, 22°C"
print(result.stderr)  # ""
```

## How It Works

```mermaid
sequenceDiagram
    box rgba(100, 100, 255, 0.2) Your App
        participant Client as Progtc Client
    end
    box rgba(100, 200, 100, 0.2) Code Sandbox
        participant Server as Progtc Server
        participant Process as Sub-Process
    end

    Client->>Server: POST /execute-code
    Server->>Process: code

    Note over Process: execute code

    Process->>Server: tool call
    Server->>Client: SSE: tool call

    activate Process
    Note over Process: paused

    Note over Client: execute tool locally

    Client->>Server: POST /tool-results
    deactivate Process
    Server->>Process: tool result

    Note over Process: continue execution...

    Process->>Server: stdout, stderr
    Server->>Client: SSE: stdout, stderr
```

1. **Your client** sends code + a list of available tool names to the progtc server
2. **The server** executes the code in an isolated process, injecting a `tools` module
3. **When code calls a tool**, the server streams the call back to your client via SSE
4. **Your client** executes the tool locally and sends the result back
5. **The server** resumes code execution with the result
6. **Stdout/stderr** are captured and streamed back when execution completes

> **Tip:** If SSE streaming doesn't suit your setup, see [Step-Based Execution](#step-based-execution) below for a request/response alternative.

## Code Guidelines

To use tools your code should import them from the tools module:

```python
from tools import my_tool
```

Tools are treated as async functions, therefore they must be awaited:

```python
from tools import my_tool
await my_tool()
```

You will receive stdout and stderr, so print the variables you want to see:

```python
from tools import tool_a, tool_b
a = await tool_a()
b = await tool_b(a)
print(b)
```

You can perform multiple tool calls at once using async gather:

```python
import asyncio
from tools import get_weather, search_database

# Call tools like regular async functions
weather, results = await asyncio.gather(
    get_weather("Tokyo", "Japan"),
    search_database("hotels"),
)

print(f"Weather: {weather}")
print(f"Results: {results}")
```

> **Note:** The code runs in a top-level async context, so you can use `await` directly without defining an async function.

## Server CLI Options

```bash
progtc serve [OPTIONS]
```

| Option                     | Default                 | Description                                 |
| -------------------------- | ----------------------- | ------------------------------------------- |
| `--host`                   | `127.0.0.1`             | Host to bind to                             |
| `--port`                   | `8000`                  | Port to bind to                             |
| `--api-key`                | (env: `PROGTC_API_KEY`) | API key for authentication                  |
| `--tool-call-timeout`      | `10.0`                  | Timeout for individual tool calls (seconds) |
| `--code-execution-timeout` | `30.0`                  | Total timeout for code execution (seconds)  |

## Error Handling

The client returns a discriminated union—either success or one of several error types:

```python
from progtc.types import MessageType

result = await client.execute_code(code, tools)

match result.type:
    case MessageType.SUCCESS:
        print(f"Stdout: {result.stdout}")
    case MessageType.SYNTAX_ERROR:
        print(f"Syntax error: {result.stderr}")
    case MessageType.RUNTIME_ERROR:
        print(f"Runtime error: {result.stderr}")
    case MessageType.TIMEOUT_ERROR:
        print(f"Timeout: {result.stderr}")
```

## Step-Based Execution

For environments where SSE streaming isn't practical (e.g. serverless functions), progtc offers a step-based API. Instead of the server streaming tool calls back over a long-lived connection, each call to `execute_code_step` runs the code until it hits a tool call, then returns the tool calls and an opaque state object. You execute the tools, then call `execute_code_step` again with the results and state to resume.

```mermaid
sequenceDiagram
    box rgba(100, 100, 255, 0.2) Your App
        participant Client as Progtc Client
    end
    box rgba(100, 200, 100, 0.2) Code Sandbox
        participant Server as Progtc Server
        participant Process as Sub-Process
    end

    Client->>Server: POST /execute-code-step {code, tool_names}
    activate Process
    Server->>Process: execute code
    Process->>Server: tool call
    Server-->>Client: {tool_calls, state}
    deactivate Process

    Note over Client: execute tools locally

    Client->>Server: POST /execute-code-step {code, tool_names, state, tool_results}
    activate Process
    Server->>Process: replay with cached results
    Process->>Server: tool call
    Server-->>Client: {tool_calls, state}
    deactivate Process


    Note over Client: execute tools locally

    Client->>Server: POST /execute-code-step {code, tool_names, state, tool_results}
    activate Process
    Server->>Process: replay with cached results
    Process->>Server: stdout, stderr
    Server-->>Client: {stdout, stderr}
    deactivate Process
```

1. **Your client** sends code + tool names to the server
2. **The server** spawns a sub-process, replays the code using any cached results from state, and runs until the next tool call
3. **The server** returns the tool calls and an opaque state object, then tears down the sub-process
4. **Your client** executes the tools locally and calls `execute_code_step` again with the results and updated state
5. **Steps 2–4 repeat** until the code completes, at which point stdout/stderr are returned

### Example

```python
import asyncio

from progtc import AsyncProgtcClient
from progtc.types import ExecuteCodeStepToolCalls

client = AsyncProgtcClient(
    base_url="https://your-sandbox-url:8000",
    api_key="your-secret-key",
)

# Define your tools as async functions
async def get_weather(city: str, country: str) -> str:
    # Your actual implementation
    return f"Weather in {city}, {country}: Sunny, 22°C"

tools = {
    "get_weather": get_weather,
}

# Execute LLM-generated code that uses your tools
code = """
from tools import get_weather
weather = await get_weather("London", "UK")
print(weather)
"""

# Step through execution
state = None
tool_results = None

while True:
    result = await client.execute_code_step(
        code=code,
        tool_names=list(tools.keys()),
        state=state,
        tool_results=tool_results,
    )
    if not isinstance(result, ExecuteCodeStepToolCalls):
        break

    # Execute the requested tools concurrently
    state = result.state
    tool_results = await asyncio.gather(*(
        tools[tc.tool_name](*tc.args, **tc.kwargs)
        for tc in result.tool_calls
    ))

print(result.stdout)  # "Weather in London, UK: Sunny, 22°C"
```

Concurrent tool calls (e.g. via `asyncio.gather` in the executed code) are batched into a single step, so you can execute them in parallel before resuming.

### Interrupting and Resuming Execution

The state is an opaque compressed string, so you can persist it and resume execution later — even in a different process. This is useful for human-in-the-loop approvals, long-running tool calls, or durable execution across serverless invocations.

```python
from progtc.types import ExecuteCodeStepToolCalls

# First invocation — run until a tool call, then save and exit
result = await client.execute_code_step(code, tool_names=["send_email"])

if isinstance(result, ExecuteCodeStepToolCalls):
    # Persist everything needed to resume later
    db.save(
        task_id=task_id,
        code=code,
        tool_calls=result.tool_calls,
        state=result.state,
    )
    return  # done for now

# ---

# Later (maybe minutes/hours later, maybe a different process)
task = db.load(task_id)

# Resume execution with the tool results
result = await client.execute_code_step(
    task.code,
    tool_names=["send_email"],
    state=task.state,
    tool_results=["sent"],
)
```

### Trade-offs

The step-based API replays the entire code from scratch on each step, using cached results to skip past previous tool calls. This has some implications:

- **Replay overhead** — Compute-heavy code is re-executed on every step. If your code does significant work between tool calls, the streaming API may be a better fit.
- **State size** — The state object accumulates all previous tool call results. For code with many tool calls or large return values, this can grow significantly.
- **Determinism required** — The replay assumes the code follows the same execution path each time. On each step, the server verifies that replayed tool calls match the originals (by comparing a hash of the tool name and arguments). If the code is non-deterministic (e.g. random values, timestamps) and produces different tool calls on replay, a `determinism_error` is returned.
- **No background concurrency** — With the streaming API, multiple tool calls can be in-flight at the same time (e.g. via `asyncio.create_task`). With the step-based API, execution stops completely at each tool call boundary. Tool calls that happen in quick succession (e.g. inside an `asyncio.gather`) are batched into a single step, but any concurrency pattern that relies on tool calls running in the background while other code continues will be significantly slower.

## Example: Pydantic AI + E2B

See [`examples/e2b-example/`](examples/e2b-example/) for a complete example using progtc with a [pydantic-ai](https://ai.pydantic.dev) agent and an [E2B](https://e2b.dev) sandbox.

---

<p align="center">
  <b>Building AI agents?</b> We're hiring: <a href="https://capsa.ai/careers">capsa.ai/careers</a>
</p>
