Metadata-Version: 2.4
Name: codex-agent-framework
Version: 0.1.18
Summary: A lightweight event-driven Codex agent runtime.
Author: Baptiste
License-Expression: MIT
Keywords: agent,ai,codex,openai,tools
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: audioop-lts; python_version >= "3.13"
Requires-Dist: beautifulsoup4
Requires-Dist: codex-backend-sdk
Requires-Dist: fastapi
Requires-Dist: filetype
Requires-Dist: modict
Requires-Dist: mss
Requires-Dist: numpy
Requires-Dist: odfpy
Requires-Dist: openai
Requires-Dist: openpyxl
Requires-Dist: pathspec
Requires-Dist: pillow
Requires-Dist: playwright
Requires-Dist: pydub
Requires-Dist: pypdf
Requires-Dist: pynteract
Requires-Dist: pywinctl
Requires-Dist: python-docx
Requires-Dist: PyYAML
Requires-Dist: regex
Requires-Dist: requests
Requires-Dist: rich
Requires-Dist: textual
Requires-Dist: tiktoken
Requires-Dist: trafilatura
Requires-Dist: uvicorn
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Dynamic: license-file

# codex-agent

`codex-agent-framework` is a comprehensive Python runtime for building local, interactive, tool-using AI agents.

It can run as a desktop assistant, as a headless CLI, as a FastAPI/SSE service, or as an embedded Python library. The core abstraction is an `Agent` with persistent sessions, local tools, context providers, slash commands, stateful plugins, document extraction, image/desktop/browser helpers, and event streaming.

> Status: early alpha. APIs are evolving quickly and this project intentionally favors clean architecture over long-term compatibility shims.

## What it is good for

`codex-agent` is designed to spin multimodal agents that need to work on the local machine, not just answer text prompts. Typical use cases include:

- a local coding/research assistant with persistent project memory;
- a terminal or desktop assistant that can inspect files, run tests, edit code, and keep context;
- a small local agent server used by a TUI, tray app, scripts, or another application;
- experiments with tool registries, runtime plugins, event-driven UIs, and autonomous wakeups.

It is meant to be powerful. But also potentially risky: shell, Python, write/edit, browser, and desktop tools run with the current user's privileges. 

## Highlights

- Reusable `Agent` abstraction with persistent JSON sessions.
- FastAPI REST/SSE server in `codex_agent.server`, backed by a process-isolated agent runtime.
- Textual terminal UI in `codex_agent.tui`, plus optional Linux tray controller.
- Headless CLI commands for prompts, status, tools, sessions, config, and interrupts.
- Built-in local tools for strict file reads, broad document extraction, file writes/edits, Bash, Python, image observation, and opening local resources.
- Stateful built-in plugins for durable semantic memory, planning/todos, scheduled wakeups, browser automation, and desktop automation.
- Runtime extension folders for user tools, providers, and slash commands.
- Event bus for streaming, UI integration, tool-call lifecycle, audio hooks, session changes, and automation.
- Document extraction for folders, text files, URLs, PDFs, DOCX, XLSX, ODT, HTML, and more.
- Optional image generation, web search, voice, LaTeX, browser, desktop, service, and tray integration points.

## Requirements

- Python 3.10 or newer.
- An OpenAI/ChatGPT subscription giving access to the Codex backend API (uses `codex-backend-sdk` via Oauth, doesn't require a developer API key for core LLM features).
- An `OPENAI_API_KEY` for optional OpenAI-dependent capabilities such as voice models and semantic memory embeddings.
- Optional Linux desktop features need extra system packages:
  - GTK 3 and Ayatana AppIndicator/AppIndicator bindings for the tray;
  - `wmctrl`, `xdotool`, and `xclip` for X11 desktop automation;
  - Playwright-managed Chromium for rendered browser workflows and fallback extraction.

See [`dependencies.txt`](dependencies.txt) for the full list of optional non-Python dependencies.

## Installation

From PyPI:

```bash
python -m pip install codex-agent-framework
```

From a local checkout:

```bash
python -m pip install -e .
```

For development:

```bash
python -m pip install -e '.[dev]'
```

Install optional Linux system dependencies for browser, desktop, tray, service, terminal, and audio features:

```bash
codex-agent install-system-deps -- -y
```

Bootstrap a local desktop setup by installing system dependencies and then installing/starting the user services:

```bash
codex-agent bootstrap -- -y
```

Useful bootstrap variants:

```bash
codex-agent bootstrap --no-system-deps
codex-agent bootstrap --no-start-service -- -y
codex-agent install-system-deps -- --dry-run --no-tray --no-audio
```

## Quick start

Start an embedded local server and open the terminal TUI:

```bash
codex-agent
```

Start the long-lived local server in the background:

```bash
codex-agent start server
```

By default, `start server` detaches and writes logs under `~/.agent_runtime/logs/server.log`. Use the foreground mode when a supervisor such as systemd/launchd should own the process:

```bash
codex-agent start server --foreground
```

Connect a TUI to an already running server:

```bash
codex-agent open tui
```

Start only the tray controller in the background:

```bash
codex-agent start tray
```

The tray log is written to `~/.agent_runtime/logs/tray.log`. Foreground mode is also available:

```bash
codex-agent start tray --foreground
```

Install startup user services for the server and tray controller:

```bash
codex-agent install-service
```

Stop or restart managed services:

```bash
codex-agent stop server
codex-agent restart server
codex-agent stop tray
```

## Headless CLI examples

The headless CLI can talk to the running server, or run an isolated one-shot runtime.

Inspect the running server:

```bash
codex-agent status
codex-agent status --json
codex-agent tools
codex-agent config get model
codex-agent sessions list
```

Update runtime config:

```bash
codex-agent config set input_token_limit 128000
codex-agent config set web_search_enabled=false
codex-agent config set model=gpt-5.5
```

Submit a prompt to the running server and stream the answer:

```bash
codex-agent run "Inspect this repository and suggest the next cleanup target."
```

Read a larger prompt from stdin:

```bash
{
  echo "Review this documentation diff for accuracy.";
  git diff -- README.md;
} | codex-agent run --stdin
```

Emit machine-readable events for another process:

```bash
codex-agent run --format ndjson "Run a quick repository health check."
```

Run without a pre-existing server by starting a temporary process runtime:

```bash
codex-agent run --runtime process "Summarize the current project layout."
```

Submit a turn and return immediately:

```bash
codex-agent run --wait-timeout 0 "Continue the long-running audit in the background."
```

Interrupt the current turn:

```bash
codex-agent interrupt "user changed priority"
```

## Python usage

### Simple embedded agent

```python
from codex_agent import Agent

agent = Agent(
    session="new",
    username="Baptiste",
    voice_enabled=False,
)

answer = agent("Summarize this repository in three practical bullet points.")
print(answer)
```

Resume the latest session interactively:

```python
from codex_agent import Agent

agent = Agent(session="latest", voice_enabled=False)
agent.interact()
```

### Add local tools

```python
from pathlib import Path
from codex_agent import Agent, tool

@tool
def list_changed_files() -> list[str]:
    """Return modified or untracked files in the current git repository."""
    import subprocess

    output = subprocess.check_output(
        ["git", "status", "--short"],
        text=True,
    )
    return [line[3:] for line in output.splitlines() if line.strip()]

@tool
def read_project_note(name: str) -> str:
    """Read a project note from ./notes by filename."""
    path = Path("notes", name).resolve()
    notes_dir = Path("notes").resolve()
    if notes_dir not in path.parents:
        raise ValueError("note must stay inside ./notes")
    return path.read_text(encoding="utf-8")

agent = Agent(session="latest", voice_enabled=False)
agent.add_tool(list_changed_files)
agent.add_tool(read_project_note)

agent("Look at the changed files and tell me what needs review first.")
```

### Add live, non-persistent context with a provider

Providers are for context that should be available to the model but should not become part of the saved conversation history. A provider is called again before each model API call, so it is a good fit for values that can change over time: the current clock, open browser state, desktop screenshots, active todos, project status, feature flags, and similar runtime context.

A minimal clock provider:

```python
from datetime import datetime
from zoneinfo import ZoneInfo

from codex_agent import Agent, provider

@provider
def current_time():
    now = datetime.now(ZoneInfo("Europe/Paris"))
    return f"Current local time: {now:%Y-%m-%d %H:%M:%S %Z}"

agent = Agent(session="latest", voice_enabled=False)
agent.add_provider(current_time)
agent("Given the current time, should I start a long-running task now?")
```

A project provider can combine stable guidance with freshly computed state:

```python
import subprocess
from codex_agent import Agent, provider

@provider
def repository_context():
    status = subprocess.check_output(
        ["git", "status", "--short"],
        text=True,
    ).strip()
    return f"""
The user is working on codex-agent-framework.
Prefer precise local inspection before editing files.
Do not keep legacy compatibility shims unless explicitly requested.
Current git status:
{status or "clean"}
""".strip()

agent = Agent(session="latest", voice_enabled=False)
agent.add_provider(repository_context)
agent("Plan a safe refactor of the CLI package.")
```

### Add a slash command

```python
from codex_agent import Agent, command, get_agent

@command
def repo():
    """Show the active session and current repository hint."""
    agent = get_agent()
    return f"session={agent.current_session_id}; repo=codex-agent"

agent = Agent(session="latest", voice_enabled=False)
agent.add_command(repo)
print(agent("/repo"))
```

## Runtime directory and sessions

By default, local runtime state is stored in:

```text
~/.agent_runtime
```

Common files and folders:

```text
sessions/          persisted conversation histories as JSON
workfolder/        generated or uploaded working files
tools/             user runtime tools
providers/         user runtime context providers
commands/          user runtime slash commands
images/            generated or persisted image outputs
browser/           persistent browser profiles and screenshots
logs/              detached server/tray logs
memory.json        durable semantic memory entries
planner.json       persistent named todos
wakeups.json       scheduled autonomous wakeups
agent_config.json  persisted runtime configuration
tui.json           currently registered TUI client process
```

Override the runtime location with:

```bash
AGENT_RUNTIME_DIR=/tmp/my-agent-runtime codex-agent
```

Session behavior:

- `Agent(session="new")` starts a fresh session.
- `Agent(session="latest")` resumes the newest saved session.
- `Agent(session="<session_id>")` loads a specific saved session.
- `Agent(session="/path/to/session.json")` loads a session file directly.

Session IDs are timestamp-based and lexicographically sortable.

## Built-in slash commands

Inside the interactive agent, commands start with `/`.

Common commands:

```text
/help                         list available commands
/sessions                     list saved sessions
/new_session                  create a new session
/load_session latest          load latest session
/load_session <session_id>    load a specific session
/delete_session <session_id>  delete a session
/next_session                 move to the next/newer session
/previous_session             move to the previous/older session
/compact                      compact completed history turns
/config                       show agent/model config
/config model=gpt-test verbosity=low
/model                        show current model
/model gpt-test               update model
/reasoning high               update reasoning effort
/verbosity low                update verbosity
/memory_config                show memory plugin config
/memory_config auto_archive=true max_tokens=4000
/memory_auto_archive on       toggle completed-turn memory archiving
```

Runtime command modules can add more commands from `~/.agent_runtime/commands/*.py`.

## Built-in tools and plugins

The default agent exposes local tools and plugin tools to the model. Names may be prefixed by the plugin namespace depending on registration.

| Area | Examples | Purpose |
| --- | --- | --- |
| Files | `read`, `view`, `write`, `edit` | Strict text reads, broad extraction, complete writes, exact-string edits. |
| Shell | `bash`, `python` | Shell commands and persistent Python execution. |
| Vision/system | `observe`, `show`, `open_tui`, `close_tui` | Image observation, opening files/URLs, TUI control. |
| Memory plugin | `memory_add`, `memory_edit`, `memory_delete`, `memory_search`, `/memory_config` | Durable semantic memory, retrieval context, and memory-owned configuration. |
| Planner plugin | `planner_create`, `planner_add`, `planner_check`, `planner_clear` | Persistent named todos surfaced as context. |
| Scheduler plugin | `scheduler_schedule`, `scheduler_cancel`, `scheduler_list`, `scheduler_restart_and_wakeup` | Future turns, periodic wakeups, post-restart continuation. |
| Browser plugin | `browser_open`, `browser_goto`, `browser_click`, `browser_fill`, `browser_press`, ... | Persistent Playwright/Chromium automation with active-tab context. |
| Desktop plugin | `desktop_start_session`, `desktop_run_commands`, `desktop_stop_session` | Screenshot-backed Linux desktop automation. |

Use dangerous tools carefully. Bash, Python, write, edit, browser, and desktop actions run locally as the current user.

## Local server, TUI, and tray

The server exposes the agent through a FastAPI bridge. It lives in `codex_agent.server` and can run the agent in a separate worker process so the HTTP/SSE server remains responsive while the model is busy.

Key endpoints:

```text
GET   /health
GET   /status
GET   /config
PATCH /config
GET   /session
GET   /sessions
GET   /messages
GET   /tools
GET   /wakeups
GET   /events
GET   /events/replay
POST  /turns
POST  /interrupt
POST  /tui/open
POST  /tui/close
POST  /restart
```

The TUI is a visual client. It connects over SSE, replays the latest turn when opened mid-session, tracks event cursors, and reconnects after server restarts or transient stream loss. The server accepts one TUI client at a time, while allowing the same client process to replace a stale SSE subscription during reconnect.

The tray can start or stop the user service, open or close the TUI, and keep the local agent available independently of the terminal UI.

## Runtime extensions

The agent loads decorated Python functions from the runtime directory:

```text
~/.agent_runtime/tools/*.py
~/.agent_runtime/providers/*.py
~/.agent_runtime/commands/*.py
```

Example runtime tool file, `~/.agent_runtime/tools/git_helpers.py`:

```python
from codex_agent import tool

@tool
def current_branch() -> str:
    """Return the current git branch."""
    import subprocess
    return subprocess.check_output(
        ["git", "branch", "--show-current"],
        text=True,
    ).strip()
```

Example runtime provider, `~/.agent_runtime/providers/clock.py`:

```python
from datetime import datetime
from zoneinfo import ZoneInfo

from codex_agent import provider

@provider
def current_time():
    now = datetime.now(ZoneInfo("Europe/Paris"))
    return f"Current local time: {now:%Y-%m-%d %H:%M:%S %Z}"
```

Decorated functions are registered automatically when the agent starts.

## Events

`Agent` exposes an event bus for UI and automation integrations.

```python
from codex_agent import Agent, MessageAddedEvent, ToolCallStartEvent

agent = Agent(session="new", voice_enabled=False)

@agent.on(MessageAddedEvent)
def log_message(event):
    print("message", event.message.type)

@agent.on(ToolCallStartEvent)
def log_tool(event):
    print("tool", event.name)
```

Useful exported events include:

- `MessageAddedEvent`
- `MessageSubmittedEvent`
- `AgentTurnStartEvent` / `AgentTurnEndEvent` / `AgentTurnErrorEvent`
- `AssistantTurnStartEvent` / `AssistantTurnEndEvent`
- `ResponseStartEvent` / `ResponseContentDeltaEvent` / `ResponseDoneEvent`
- `ToolCallStartEvent` / `ToolCallDoneEvent`
- `SessionLoadedEvent` / `SessionDeletedEvent`
- `AgentInterruptedEvent`
- `AudioPlaybackEvent`

## Configuration

`Agent` accepts configuration through keyword arguments:

```python
from codex_agent import Agent

agent = Agent(
    session="latest",
    model="gpt-5.4",
    reasoning_effort="medium",
    verbosity="medium",
    input_token_limit=128000,
    auto_compact=True,
    web_search_enabled=False,
    image_generation_enabled=False,
    voice_enabled=False,
    builtin_tools=None,      # None = all built-ins, [] = none, or explicit names
    builtin_providers=None,
    builtin_plugins=None,
)
```

Agent-level configuration can also be changed through slash commands, the HTTP API, or the CLI:

```bash
codex-agent config get
codex-agent config get input_token_limit
codex-agent config set input_token_limit 128000
codex-agent config set voice_enabled=false
codex-agent config set builtin_plugins='["memory", "planner", "scheduler"]'
```

Agent-level configuration is persisted to `agent_config.json` in the runtime directory when saved through agent helpers, slash commands, the HTTP API, or `codex-agent config set`. Plugin-specific technical settings stay with their plugin state instead of being mixed into `AgentConfig`. For example, the memory plugin persists `auto_archive`, embedding precision/dimensions, ranking decay, and retrieval token budget in `memory.json`, and exposes its own slash commands:

```text
/memory_config
/memory_config auto_archive=true max_tokens=4000 dimensions=128
/memory_auto_archive off
```

Use `--no-save` with the CLI for transient agent-level updates:

```bash
codex-agent config set --no-save verbosity low
```

## Project layout

```text
codex_agent/                      Python package
codex_agent/agent.py              Central Agent object
codex_agent/mainloop.py           Agent turn loop and tool execution flow
codex_agent/agent_runtime.py      In-process and process-backed runtime adapters
codex_agent/server/               FastAPI REST/SSE bridge package
codex_agent/tui/                  Textual TUI client and lifecycle helpers
codex_agent/cli/                  Root CLI and headless command implementation
codex_agent/builtin_tools/        Built-in file/shell/vision/system/server tools
codex_agent/builtin_plugins/      Stateful memory/planner/scheduler/browser/desktop plugins
codex_agent/get_text/             Document extraction helpers
codex_agent/prompts/              Packaged system prompts
codex_agent/service.py            systemd/launchd user service helpers
codex_agent/tray.py               GTK tray controller
tests/                            Test suite
scripts/                          Source scripts mirrored into package data
pyproject.toml                    Package metadata and build config
MANIFEST.in                       Source distribution includes
```

## Testing

Run the full suite:

```bash
python -m pytest
```

Run lightweight static checks used during local cleanup:

```bash
python -m pyflakes codex_agent tests
```

The tests isolate `AGENT_RUNTIME_DIR` automatically, so they should not create or resume sessions from your real `~/.agent_runtime`.

The published `0.1.18` release was validated at:

```text
449 passed
```

## Packaging

Build source and wheel distributions with:

```bash
python -m pip install build twine
rm -rf build dist *.egg-info
python -m build
python -m twine check dist/*
```

The distribution includes prompt text files, `codex_agent/get_text/default_gitignore`, and the packaged Linux system dependency installer through package data and `MANIFEST.in`.

## Recent changes

- `0.1.18`: make `start server/tray` detach by default, move durable memory fully into the memory plugin, keep wakeups as a core runtime primitive, expose RAG memory timestamps/sources, and move browser/desktop backend controllers into core modules with plugins as facades.
- `0.1.17`: reorganize the TUI and FastAPI server into dedicated `codex_agent.tui` and `codex_agent.server` packages, remove obsolete message payload fields, and clean unused imports.
- `0.1.16`: tolerate missing OpenAI API keys at startup by disabling OpenAI-dependent voice and memory features, and skip memory archiving when the memory plugin is unavailable.
- `0.1.15`: add the `AgentRuntime` interface for server/CLI/process adapters, split the CLI into a package with headless runtime commands, add `codex-agent config set` plus `PATCH /config`, and make built-in tools/providers/plugins configurable.
- `0.1.14`: add HookManager infrastructure, planner/scheduler robustness fixes, documented system dependencies, `codex-agent install-system-deps`, `codex-agent bootstrap`, and improved TUI SSE reconnect/replay handling.
- `0.1.13`: reorganize built-in tools into a `codex_agent.builtin_tools` package, keeping the public import surface compatible while separating file, shell, vision, system, and server-tool modules.
- `0.1.12`: add a persistent Playwright/Chromium browser controller with tab navigation, DOM/action snapshots, screenshots, form/click/key tools, and `browser_goto(url)` for active-tab navigation.
- `0.1.11`: split strict line-numbered UTF-8 `read` from unnumbered extracted `view`, preserve blank lines in read snippets, and show persistent+temporary message counts in the TUI status bar.
- `0.1.10`: persist only backend compaction summaries, drop bulky compacted conversation payloads, and refresh context status after compaction.
- `0.1.9`: maintenance packaging release after validating the local execution environment and deploy workflow.
- `0.1.8`: scope TUI replay/SSE catch-up to the active session and make bash/python subprocesses inherit the project Python environment, including service-launched agents.
- `0.1.7`: add durable RAG memory, scheduled wakeups, process-isolated server runtime, tray/service controls, robust SSE replay/reconnect, richer TUI status, and improved token estimates.
- `0.1.6`: add the FastAPI REST/SSE bridge, HTTP/SSE client, async-style agent mainloop, and decoupled TUI operation.

See [`CHANGELOG.md`](CHANGELOG.md) for the full history.

## Safety notes

This project is designed to let an AI assistant act on the local machine. That is powerful and potentially risky.

Recommended practices:

- Use a dedicated runtime directory for experiments.
- Review tool calls before enabling autonomous workflows.
- Avoid running the agent with elevated privileges.
- Keep secrets out of prompts, logs, and committed runtime files.
- Prefer temporary workfolders in tests and demos.
- Treat browser and desktop automation as real user actions.

## License

MIT. See [LICENSE](LICENSE).
