Metadata-Version: 2.4
Name: codex-api-proxy
Version: 0.1.1
Summary: Local OpenAI-compatible HTTP proxy backed by Codex CLI
Author: codex-api-proxy contributors
License-Expression: MIT
Keywords: codex,openai,proxy,api,local
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Framework :: FastAPI
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: fastapi<1,>=0.115
Requires-Dist: pydantic<3,>=2.7
Requires-Dist: uvicorn[standard]<1,>=0.30
Provides-Extra: dev
Requires-Dist: httpx<1,>=0.27; extra == "dev"
Requires-Dist: pytest<9,>=8; extra == "dev"
Requires-Dist: pytest-asyncio<1,>=0.23; extra == "dev"

# codex-api-proxy

Local OpenAI-compatible HTTP proxy backed by local Codex credentials.

This project exposes a minimal `/v1/chat/completions` API for local automation. By default, requests are executed through `codex exec --json --skip-git-repo-check --ignore-user-config --ignore-rules --sandbox read-only --ephemeral`, using the local Codex installation and its existing authentication.

## Safety

The proxy defaults to `127.0.0.1` and should not be exposed publicly. Any client with access can spend your local Codex quota and can ask Codex to inspect files that are available to the selected Codex sandbox and workspace.

Set `CODEX_PROXY_API_KEY` to require `Authorization: Bearer <key>` on API requests.

If you start with `--host 0.0.0.0` or another non-loopback bind address without `--api-key`, `codex-api-proxy` prints a warning. Use a bearer token before exposing the service to anything other than a trusted local machine.

With the default `exec` engine, Codex subprocesses are launched with `--ignore-user-config` and `--ignore-rules`. This prevents proxy requests from loading user Codex config, MCP servers, plugins, skills, and rule files.

Codex subprocesses also use `--sandbox read-only` and `--ephemeral` by default. This keeps calls closer to one-shot model calls where the caller owns conversation context.
Use `--agent` only for trusted clients when you want Codex to use agent tools and create or modify files under the selected workspace.

The experimental `app-server` engine uses Codex's long-lived app-server protocol to reduce process startup latency and stream assistant deltas. Each API request starts a fresh Codex thread and archives it after completion, so callers must continue sending full chat history in `messages`. The app-server process uses an isolated `CODEX_HOME` at `~/.codex-api-proxy/codex-home` by default. `codex-api-proxy` symlinks only the current Codex `auth.json` into that isolated home, so the app-server worker can reuse the existing login while not seeing the current user's `config.toml`, MCP config, or plugins. The app-server process is also started with `--disable apps`, `--disable plugins`, `--disable skill_mcp_dependency_install`, and `-c mcp_servers={}`. To keep skills out of the model-visible prompt, `codex-api-proxy` generates a `skills.config=[{name=...,enabled=false}]` override for known system skills and locally discovered skill names. Each request uses `approvalPolicy: never`, `sandbox: read-only`, empty `dynamicTools`, empty `environments`, and `ephemeral: true` by default. With `--agent`, app-server requests use `sandbox: workspace-write` and omit the empty tool/environment overrides so Codex can use its normal agent tools.

## Install

```bash
pip3 install codex-api-proxy
```

For local development from this checkout:

```bash
python3 -m pip install -e '.[dev]'
```

Make targets are available for local build and release tasks:

```bash
make build-tools
make test
make build
make release-check
make publish VERSION=0.1.1
```

`make publish VERSION=...` first syncs that version into `pyproject.toml` and `src/codex_api_proxy/__init__.py`, then runs tests, builds the package, validates the generated artifacts, and uploads them to PyPI.

## Run

Start in the background:

```bash
codex-api-proxy start
```

By default, the service listens on `127.0.0.1:8765`.
The default Codex working directory is an empty workspace at `~/.codex-api-proxy/workspace`.

Bind to all interfaces:

```bash
codex-api-proxy start --host 0.0.0.0
```

Check status:

```bash
codex-api-proxy status
```

Show saved runtime settings:

```bash
codex-api-proxy status --verbose
```

Restart with the last successful `start` settings:

```bash
codex-api-proxy restart
```

Restart and override one setting:

```bash
codex-api-proxy restart --proxy=http://127.0.0.1:8118
```

Start with faster defaults:

```bash
codex-api-proxy start --fast
```

Start with experimental long-lived app-server workers:

```bash
codex-api-proxy start --engine app-server --workers 2
```

Start with an outbound proxy, faster defaults, and multiple app-server workers:

```bash
codex-api-proxy start --proxy=http://127.0.0.1:8118 --fast --engine app-server --workers 4
```

Stop:

```bash
codex-api-proxy stop
```

Run in the foreground for debugging:

```bash
codex-api-proxy start --foreground
```

## Configuration

CLI options:

- `--host`: bind host, default `127.0.0.1`
- `--port`: bind port, default `8765`
- `--api-key`: require bearer auth
- `--codex-bin`: Codex executable, default `codex`
- `--proxy`: proxy URL passed to Codex as `http_proxy` and `https_proxy`
- `--model`: model passed to Codex
- `--engine`: execution engine, `exec` or `app-server`, default `exec`
- `--workers`: number of long-lived `app-server` workers, default `1`
- `--max-queue-size`: maximum queued `app-server` requests before returning `429`, default `64`
- `--queue-timeout-seconds`: maximum time to wait for an `app-server` worker, default `30`
- `--app-server-codex-home`: isolated `CODEX_HOME` used by `app-server` workers, default `~/.codex-api-proxy/codex-home`
- `--codex-config`: Codex config override passed as `-c key=value`, repeatable
- `--ephemeral`: run `codex exec` with `--ephemeral`, enabled by default
- `--agent` / `--no-agent`: enable or disable Codex agent tools and workspace writes, default disable
- `--fast`: use fast defaults: `--codex-config model_reasoning_effort="low"`
- `--default-cwd`: default Codex working directory, default `~/.codex-api-proxy/workspace`
- `--allowed-root`: allowed cwd root, repeatable, default `--default-cwd`
- `--timeout-seconds`: per-request timeout, default `300`
- `--max-concurrency`: maximum concurrent Codex executions, default `1`
- `--log-level`: Uvicorn log level, one of `debug`, `info`, `warning`, or `error`, default `info`
- `--pid-file`: daemon pid file, default `~/.codex-api-proxy/codex-api-proxy.pid`
- `--log-file`: daemon log file for `start`, default `~/.codex-api-proxy/codex-api-proxy.log`
- `--state-file`: daemon state file, default `~/.codex-api-proxy/codex-api-proxy.state.json`

`start` prints the state file path and the effective startup parameters. The state file is written with `0600` permissions and is used by `restart` to reuse the previous start settings. If `--api-key` is used, the key is redacted in terminal output but stored in the state file so `restart` can reuse it.

Environment variables are also supported when running the FastAPI app directly:

- `CODEX_PROXY_HOST`: bind host, default `127.0.0.1`
- `CODEX_PROXY_PORT`: bind port, default `8765`
- `CODEX_PROXY_API_KEY`: optional bearer token
- `CODEX_PROXY_CODEX_BIN`: Codex executable, default `codex`
- `CODEX_PROXY_PROXY`: proxy URL passed to Codex
- `CODEX_PROXY_MODEL`: model passed to Codex
- `CODEX_PROXY_ENGINE`: execution engine, `exec` or `app-server`, default `exec`
- `CODEX_PROXY_WORKERS`: number of long-lived `app-server` workers, default `1`
- `CODEX_PROXY_MAX_QUEUE_SIZE`: maximum queued `app-server` requests, default `64`
- `CODEX_PROXY_QUEUE_TIMEOUT_SECONDS`: maximum time to wait for an `app-server` worker, default `30`
- `CODEX_PROXY_APP_SERVER_CODEX_HOME`: isolated `CODEX_HOME` used by `app-server` workers
- `CODEX_PROXY_CODEX_CONFIGS`: `;;`-separated Codex config overrides passed as repeated `-c`
- `CODEX_PROXY_EPHEMERAL`: set to `1`, `true`, or `yes` to run `codex exec` with `--ephemeral`; defaults to `true`
- `CODEX_PROXY_AGENT`: set to `1`, `true`, or `yes` to enable Codex agent tools and workspace writes; defaults to `false`
- `CODEX_PROXY_DEFAULT_CWD`: default Codex working directory, default current directory
- `CODEX_PROXY_ALLOWED_ROOTS`: colon-separated allowed cwd roots, default `CODEX_PROXY_DEFAULT_CWD`
- `CODEX_PROXY_TIMEOUT_SECONDS`: per-request timeout, default `300`
- `CODEX_PROXY_MAX_CONCURRENCY`: maximum concurrent Codex executions, default `1`
- `CODEX_PROXY_LOG_LEVEL`: Uvicorn log level, default `info`

## API

Health:

```bash
curl -sS http://127.0.0.1:8765/health
```

Models:

```bash
curl -sS http://127.0.0.1:8765/v1/models
```

Readiness:

```bash
curl -sS http://127.0.0.1:8765/ready
```

Local counters:

```bash
curl -sS http://127.0.0.1:8765/metrics
```

Chat completion:

```bash
curl -sS http://127.0.0.1:8765/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
```

Streaming chat completion:

```bash
curl -N http://127.0.0.1:8765/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"codex-local","stream":true,"messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
```

Streaming responses use OpenAI-compatible SSE events:

- `data: {"object":"chat.completion.chunk",...}` for assistant chunks
- `data: [DONE]` when the response is complete

With the default `exec` engine, the proxy streams at the HTTP protocol layer. The underlying Codex CLI currently provides the assistant answer through `codex exec --json`; if Codex only emits final assistant text for a request, the streamed content chunk will arrive after Codex completes.

With `--engine app-server`, the proxy maps Codex `item/agentMessage/delta` notifications to OpenAI-compatible SSE content chunks. This is experimental because Codex's app-server protocol is itself experimental.

## Compatibility

`codex-api-proxy` is OpenAI-compatible for the local chat-completions shape, not a complete OpenAI API implementation.

Supported:

- `GET /v1/models`
- `POST /v1/chat/completions`
- `model`
- `messages`
- `stream`
- `metadata.cwd` for request-scoped working directory selection inside `--allowed-root`
- OpenAI-compatible non-streaming response envelope
- OpenAI-compatible SSE chunk envelope for streaming responses

Accepted but currently ignored:

- `temperature`
- `top_p`
- `max_tokens`
- `presence_penalty`
- `frequency_penalty`

Not supported:

- `tools` and `tool_choice`
- `response_format`
- `n` greater than one
- `stop`
- embeddings, responses, assistants, files, batches, audio, images, and other OpenAI endpoints
- accurate token `usage`; the response currently returns zero token counts because Codex CLI does not expose stable token accounting through this path

The app-server engine starts a fresh Codex thread for each API request and archives it after completion. Callers must include the full chat history in `messages`; `codex-api-proxy` does not preserve conversation state between API requests.

OpenAI Python SDK smoke test:

```python
from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8765/v1", api_key="local-secret")

response = client.chat.completions.create(
    model="codex-local",
    messages=[{"role": "user", "content": "Reply with exactly: pong"}],
)
print(response.choices[0].message.content)
```

When no `--api-key` is configured, most OpenAI SDKs still require a placeholder `api_key`; any non-empty value is fine.

## Operations

Use `/health` for a lightweight process check and `/ready` for a readiness check that includes the selected engine and Codex executable availability. Use `/metrics` for local JSON counters:

- `requests_total`
- `requests_ok`
- `requests_error`
- `errors_by_status`
- `engine`
- `uptime_seconds`
- `app_server_pool_started`

Daemon logs are written to `~/.codex-api-proxy/codex-api-proxy.log` by default. `codex-api-proxy` does not rotate logs itself; use your OS log rotation mechanism if you run it long-term.

Latency logs:

Each chat completion writes a single-line JSON log with logger `codex_api_proxy.latency` and event `chat_completion_latency`. Streaming responses also write `chat_completion_first_sse` when the first SSE chunk is yielded.

For background daemon runs, inspect:

```bash
rg 'codex_api_proxy.latency|chat_completion_latency|chat_completion_first_sse' ~/.codex-api-proxy/codex-api-proxy.log
```

Important fields:

- `request_id`: correlates latency lines for the same request
- `stream`: whether the request used `stream: true`
- `engine`: `exec` or `app-server`
- `phases_ms.cwd_resolve`: cwd validation time
- `phases_ms.prompt_build`: OpenAI messages to Codex prompt conversion time
- `phases_ms.queue_wait`: time waiting for local admission before engine execution
- `phases_ms.codex_exec`: time spent inside `codex exec`
- `phases_ms.app_server_exec`: time spent inside the app-server worker turn
- `phases_ms.codex_command_build`: Codex command construction time
- `phases_ms.codex_process_spawn`: local subprocess spawn time
- `phases_ms.codex_stdin_write`: prompt write and stdin close time
- `phases_ms.codex_first_stdout_event`: elapsed time from Codex IO start until the first non-empty stdout JSONL line
- `phases_ms.codex_first_assistant_event`: elapsed time from Codex IO start until the first assistant message event
- `phases_ms.codex_stdout_read`: total time spent reading Codex stdout until EOF
- `phases_ms.codex_process_wait`: time waiting for the Codex process after stdout EOF
- `phases_ms.codex_communicate`: total Codex subprocess IO time
- `phases_ms.codex_output_parse`: Codex JSONL final-message parse time
- `phases_ms.response_build`: response object/SSE setup time
- `phases_ms.total`: total server-side request time before response is ready
- `time_to_first_sse_ms`: stream request time until the first SSE chunk is yielded
- `time_to_first_content_sse_ms`: app-server stream request time until the first content chunk is yielded

With auth:

```bash
curl -sS http://127.0.0.1:8765/v1/chat/completions \
  -H 'Authorization: Bearer local-secret' \
  -H 'Content-Type: application/json' \
  -d '{"model":"codex-local","messages":[{"role":"user","content":"Reply with exactly: pong"}]}'
```
