Metadata-Version: 2.4
Name: codex-agent-framework
Version: 0.1.19
Summary: A lightweight event-driven Codex agent runtime.
Author: Baptiste
License-Expression: MIT
Keywords: agent,ai,codex,openai,tools
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: audioop-lts; python_version >= "3.13"
Requires-Dist: beautifulsoup4
Requires-Dist: codex-backend-sdk
Requires-Dist: fastapi
Requires-Dist: filetype
Requires-Dist: modict
Requires-Dist: mss
Requires-Dist: numpy
Requires-Dist: odfpy
Requires-Dist: openai
Requires-Dist: openpyxl
Requires-Dist: pathspec
Requires-Dist: pillow
Requires-Dist: playwright
Requires-Dist: pydub
Requires-Dist: pypdf
Requires-Dist: pynteract
Requires-Dist: pywinctl
Requires-Dist: python-docx
Requires-Dist: PyYAML
Requires-Dist: regex
Requires-Dist: requests
Requires-Dist: rich
Requires-Dist: textual
Requires-Dist: tiktoken
Requires-Dist: trafilatura
Requires-Dist: uvicorn
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Dynamic: license-file

# codex-agent

`codex-agent-framework` is a comprehensive Python runtime for building local, interactive, tool-using AI agents.

It can run as a desktop assistant, as a headless CLI, as a FastAPI/SSE service, or as an embedded Python library. The core abstraction is an `Agent` with persistent sessions, local tools, context providers, slash commands, stateful plugins, document extraction, image/desktop/browser helpers, and event streaming.

> Status: early alpha. APIs are evolving quickly and this project intentionally favors clean architecture over long-term compatibility shims.

## What it is good for

`codex-agent` is designed to spin multimodal agents that need to work on the local machine, not just answer text prompts. Typical use cases include:

- a local coding/research assistant with persistent project memory;
- a terminal or desktop assistant that can inspect files, run tests, edit code, and keep context;
- a small local agent server used by a TUI, tray app, scripts, or another application;
- experiments with tool registries, runtime plugins, event-driven UIs, and autonomous wakeups.

It is meant to be powerful. But also potentially risky: shell, Python, write/edit, browser, and desktop tools run with the current user's privileges. 

## Highlights

- Reusable `Agent` abstraction with persistent JSON sessions.
- FastAPI REST/SSE server in `codex_agent.server`, backed by a process-isolated agent runtime.
- Textual terminal UI in `codex_agent.tui`, plus optional Linux tray controller.
- Headless CLI commands for prompts, status, tools, sessions, config, and interrupts.
- Built-in local tools for strict file reads, broad document extraction, file writes/edits, Bash, Python, image observation, and opening local resources.
- Stateful built-in plugins for durable semantic memory, planning/todos, scheduled wakeups, browser automation, and desktop automation.
- Runtime extension folders for user tools, providers, and slash commands.
- Event bus for streaming, UI integration, tool-call lifecycle, audio hooks, session changes, and automation.
- Document extraction for folders, text files, URLs, PDFs, DOCX, XLSX, ODT, HTML, and more.
- Optional image generation, web search, voice, LaTeX, browser, desktop, service, and tray integration points.

## Requirements

- Python 3.10 or newer.
- An OpenAI/ChatGPT subscription giving access to the Codex backend API (uses `codex-backend-sdk` via Oauth, doesn't require a developer API key for core LLM features).
- An `OPENAI_API_KEY` for optional OpenAI-dependent capabilities such as voice models and semantic memory embeddings.
- Optional Linux desktop features need extra system packages:
  - GTK 3 and Ayatana AppIndicator/AppIndicator bindings for the tray;
  - `wmctrl`, `xdotool`, and `xclip` for X11 desktop automation;
  - Playwright-managed Chromium for rendered browser workflows and fallback extraction.

See [`dependencies.txt`](dependencies.txt) for the full list of optional non-Python dependencies.

## Installation

From PyPI:

```bash
python -m pip install codex-agent-framework
```

From a local checkout:

```bash
python -m pip install -e .
```

For development:

```bash
python -m pip install -e '.[dev]'
```

Install optional Linux system dependencies for browser, desktop, tray, service, terminal, and audio features:

```bash
codex-agent install-system-deps -- -y
```

Bootstrap a local desktop setup by installing system dependencies and then installing/starting the user services:

```bash
codex-agent bootstrap -- -y
```

Useful bootstrap variants:

```bash
codex-agent bootstrap --no-system-deps
codex-agent bootstrap --no-start-service -- -y
codex-agent install-system-deps -- --dry-run --no-tray --no-audio
```

## Quick start

Start an embedded local server and open the terminal TUI:

```bash
codex-agent
```

Start the long-lived local server in the background:

```bash
codex-agent start server
```

By default, `start server` detaches and writes logs under `~/.agent_runtime/logs/server.log`. Use the foreground mode when a supervisor such as systemd/launchd should own the process:

```bash
codex-agent start server --foreground
```

Connect a TUI to an already running server:

```bash
codex-agent open tui
```

Start only the tray controller in the background:

```bash
codex-agent start tray
```

The tray log is written to `~/.agent_runtime/logs/tray.log`. Foreground mode is also available:

```bash
codex-agent start tray --foreground
```

Install startup user services for the server and tray controller:

```bash
codex-agent install-service
```

Stop or restart managed services:

```bash
codex-agent stop server
codex-agent restart server
codex-agent stop tray
```

## Headless CLI examples

The headless CLI can talk to the running server, or run an isolated one-shot runtime.

Inspect the running server:

```bash
codex-agent status
codex-agent status --json
codex-agent tools
codex-agent config get model
codex-agent sessions list
```

Update runtime config:

```bash
codex-agent config set input_token_limit 128000
codex-agent config set web_search_enabled=false
codex-agent config set model=gpt-5.5
```

Submit a prompt to the running server and stream the answer:

```bash
codex-agent run "Inspect this repository and suggest the next cleanup target."
```

Read a larger prompt from stdin:

```bash
{
  echo "Review this documentation diff for accuracy.";
  git diff -- README.md;
} | codex-agent run --stdin
```

Emit machine-readable events for another process:

```bash
codex-agent run --format ndjson "Run a quick repository health check."
```

Run without a pre-existing server by starting a temporary process runtime:

```bash
codex-agent run --runtime process "Summarize the current project layout."
```

Submit a turn and return immediately:

```bash
codex-agent run --wait-timeout 0 "Continue the long-running audit in the background."
```

Interrupt the current turn:

```bash
codex-agent interrupt "user changed priority"
```

## Python usage

### Simple embedded agent

```python
from codex_agent import Agent

agent = Agent(
    session="new",
    username="Baptiste",
    voice_enabled=False,
)

answer = agent("Summarize this repository in three practical bullet points.")
print(answer)
```

Resume the latest session interactively:

```python
from codex_agent import Agent

agent = Agent(session="latest", voice_enabled=False)
agent.interact()
```

### Add local tools

```python
from pathlib import Path
from codex_agent import Agent, tool

@tool
def list_changed_files() -> list[str]:
    """Return modified or untracked files in the current git repository."""
    import subprocess

    output = subprocess.check_output(
        ["git", "status", "--short"],
        text=True,
    )
    return [line[3:] for line in output.splitlines() if line.strip()]

@tool
def read_project_note(name: str) -> str:
    """Read a project note from ./notes by filename."""
    path = Path("notes", name).resolve()
    notes_dir = Path("notes").resolve()
    if notes_dir not in path.parents:
        raise ValueError("note must stay inside ./notes")
    return path.read_text(encoding="utf-8")

agent = Agent(session="latest", voice_enabled=False)
agent.add_tool(list_changed_files)
agent.add_tool(read_project_note)

agent("Look at the changed files and tell me what needs review first.")
```

### Add live, non-persistent context with a provider

Providers are for context that should be available to the model but should not become part of the saved conversation history. A provider is called again before each model API call, so it is a good fit for values that can change over time: the current clock, open browser state, desktop screenshots, active todos, project status, feature flags, and similar runtime context.

A minimal clock provider:

```python
from datetime import datetime
from zoneinfo import ZoneInfo

from codex_agent import Agent, provider

@provider
def current_time():
    now = datetime.now(ZoneInfo("Europe/Paris"))
    return f"Current local time: {now:%Y-%m-%d %H:%M:%S %Z}"

agent = Agent(session="latest", voice_enabled=False)
agent.add_provider(current_time)
agent("Given the current time, should I start a long-running task now?")
```

A project provider can combine stable guidance with freshly computed state:

```python
import subprocess
from codex_agent import Agent, provider

@provider
def repository_context():
    status = subprocess.check_output(
        ["git", "status", "--short"],
        text=True,
    ).strip()
    return f"""
The user is working on codex-agent-framework.
Prefer precise local inspection before editing files.
Do not keep legacy compatibility shims unless explicitly requested.
Current git status:
{status or "clean"}
""".strip()

agent = Agent(session="latest", voice_enabled=False)
agent.add_provider(repository_context)
agent("Plan a safe refactor of the CLI package.")
```

### Add a slash command

```python
from codex_agent import Agent, command, get_agent

@command
def repo():
    """Show the active session and current repository hint."""
    agent = get_agent()
    return f"session={agent.current_session_id}; repo=codex-agent"

agent = Agent(session="latest", voice_enabled=False)
agent.add_command(repo)
print(agent("/repo"))
```

## Runtime directory and sessions

By default, local runtime state is stored in:

```text
~/.agent_runtime
```

Common files and folders:

```text
sessions/          persisted conversation histories as JSON
workfolder/        generated or uploaded working files
tools/             user runtime tools
providers/         user runtime context providers
commands/          user runtime slash commands
images/            generated or persisted image outputs
browser/           persistent browser profiles and screenshots
logs/              detached server/tray logs
memory.json        durable semantic memory entries
planner.json       persistent named todos
wakeups.json       scheduled autonomous wakeups
agent_config.json  persisted runtime configuration
tui.json           currently registered TUI client process
```

Override the runtime location with:

```bash
AGENT_RUNTIME_DIR=/tmp/my-agent-runtime codex-agent
```

Session behavior:

- `Agent(session="new")` starts a fresh session.
- `Agent(session="latest")` resumes the newest saved session.
- `Agent(session="<session_id>")` loads a specific saved session.
- `Agent(session="/path/to/session.json")` loads a session file directly.

Session IDs are timestamp-based and lexicographically sortable.

## Built-in slash commands

Inside the interactive agent, commands start with `/`.

Common commands:

```text
/help                         list available commands
/sessions                     list saved sessions
/new_session                  create a new session
/load_session latest          load latest session
/load_session <session_id>    load a specific session
/delete_session <session_id>  delete a session
/next_session                 move to the next/newer session
/previous_session             move to the previous/older session
/compact                      compact completed history turns
/config                       show agent/model config
/config model=gpt-test verbosity=low
/model                        show current model
/model gpt-test               update model
/reasoning high               update reasoning effort
/verbosity low                update verbosity
/memory_config                show memory plugin config
/memory_config auto_archive=true max_tokens=4000
/memory_auto_archive on       toggle completed-turn memory archiving
```

Runtime command modules can add more commands from `~/.agent_runtime/commands/*.py`.

## Built-in tools and plugins

The default agent exposes local tools and plugin tools to the model. Names may be prefixed by the plugin namespace depending on registration.

| Area | Examples | Purpose |
| --- | --- | --- |
| Files | `read`, `view`, `write`, `edit` | Strict text reads, broad extraction, complete writes, exact-string edits. |
| Shell | `bash`, `python` | Shell commands and persistent Python execution. |
| Vision/system | `observe`, `show`, `open_tui`, `close_tui` | Image observation, opening files/URLs, TUI control. |
| Memory plugin | `memory_add`, `memory_edit`, `memory_delete`, `memory_search`, `/memory_config` | Durable semantic memory, retrieval context, and memory-owned configuration. |
| Planner plugin | `planner_create`, `planner_add`, `planner_check`, `planner_clear` | Persistent named todos surfaced as context. |
| Scheduler plugin | `scheduler_schedule`, `scheduler_cancel`, `scheduler_list`, `scheduler_restart_and_wakeup` | Future turns, periodic wakeups, post-restart continuation. |
| Browser plugin | `browser_open`, `browser_goto`, `browser_click`, `browser_fill`, `browser_press`, ... | Persistent Playwright/Chromium automation with active-tab context. |
| Desktop plugin | `desktop_start_session`, `desktop_run_commands`, `desktop_stop_session` | Screenshot-backed Linux desktop automation. |

Use dangerous tools carefully. Bash, Python, write, edit, browser, and desktop actions run locally as the current user.

## Local server, TUI, and tray

The server exposes the agent through a FastAPI bridge. It lives in `codex_agent.server` and can run the agent in a separate worker process so the HTTP/SSE server remains responsive while the model is busy.

Key endpoints:

```text
GET   /health
GET   /status
GET   /config
PATCH /config
GET   /session
GET   /sessions
GET   /messages
GET   /tools
GET   /wakeups
GET   /events
GET   /events/replay
POST  /turns
POST  /interrupt
POST  /tui/open
POST  /tui/close
POST  /restart
```

The TUI is a visual client. It connects over SSE, replays the latest turn when opened mid-session, tracks event cursors, and reconnects after server restarts or transient stream loss. The server accepts one TUI client at a time, while allowing the same client process to replace a stale SSE subscription during reconnect.

The tray can start or stop the user service, open or close the TUI, and keep the local agent available independently of the terminal UI.

## Runtime extensions

The agent loads decorated Python functions from the runtime directory:

```text
~/.agent_runtime/tools/*.py
~/.agent_runtime/providers/*.py
~/.agent_runtime/commands/*.py
```

Example runtime tool file, `~/.agent_runtime/tools/git_helpers.py`:

```python
from codex_agent import tool

@tool
def current_branch() -> str:
    """Return the current git branch."""
    import subprocess
    return subprocess.check_output(
        ["git", "branch", "--show-current"],
        text=True,
    ).strip()
```

Example runtime provider, `~/.agent_runtime/providers/clock.py`:

```python
from datetime import datetime
from zoneinfo import ZoneInfo

from codex_agent import provider

@provider
def current_time():
    now = datetime.now(ZoneInfo("Europe/Paris"))
    return f"Current local time: {now:%Y-%m-%d %H:%M:%S %Z}"
```

Decorated functions are registered automatically when the agent starts.

## Events

`Agent` exposes an event bus for UI and automation integrations.

```python
from codex_agent import Agent, MessageAddedEvent, ToolCallStartEvent

agent = Agent(session="new", voice_enabled=False)

@agent.on(MessageAddedEvent)
def log_message(event):
    print("message", event.message.type)

@agent.on(ToolCallStartEvent)
def log_tool(event):
    print("tool", event.name)
```

Useful exported events include:

- `MessageAddedEvent`
- `MessageSubmittedEvent`
- `AgentTurnStartEvent` / `AgentTurnEndEvent` / `AgentTurnErrorEvent`
- `AssistantTurnStartEvent` / `AssistantTurnEndEvent`
- `ResponseStartEvent` / `ResponseContentDeltaEvent` / `ResponseDoneEvent`
- `ToolCallStartEvent` / `ToolCallDoneEvent`
- `SessionLoadedEvent` / `SessionDeletedEvent`
- `AgentInterruptedEvent`
- `AudioPlaybackEvent`

## Configuration

`Agent` accepts configuration through keyword arguments:

```python
from codex_agent import Agent

agent = Agent(
    session="latest",
    model="gpt-5.4",
    reasoning_effort="medium",
    verbosity="medium",
    input_token_limit=128000,
    auto_compact=True,
    web_search_enabled=False,
    image_generation_enabled=False,
    voice_enabled=False,
    builtin_tools=None,      # None = all built-ins, [] = none, or explicit names
    builtin_providers=None,
    builtin_plugins=None,
)
```

Agent-level configuration can also be changed through slash commands, the HTTP API, or the CLI:

```bash
codex-agent config get
codex-agent config get input_token_limit
codex-agent config set input_token_limit 128000
codex-agent config set voice_enabled=false
codex-agent config set builtin_plugins='["memory", "planner", "scheduler"]'
```

Agent-level configuration is persisted to `agent_config.json` in the runtime directory when saved through agent helpers, slash commands, the HTTP API, or `codex-agent config set`. Plugin-specific technical settings stay with their plugin state instead of being mixed into `AgentConfig`. For example, the memory plugin persists `auto_archive`, embedding precision/dimensions, ranking decay, and retrieval token budget in `memory.json`, and exposes its own slash commands:

```text
/memory_config
/memory_config auto_archive=true max_tokens=4000 dimensions=128
/memory_auto_archive off
```

Use `--no-save` with the CLI for transient agent-level updates:

```bash
codex-agent config set --no-save verbosity low
```

## Project layout

```text
codex_agent/                      Python package
codex_agent/agent.py              Central Agent object
codex_agent/mainloop.py           Agent turn loop and tool execution flow
codex_agent/agent_runtime.py      In-process and process-backed runtime adapters
codex_agent/server/               FastAPI REST/SSE bridge package
codex_agent/tui/                  Textual TUI client and lifecycle helpers
codex_agent/cli/                  Root CLI and headless command implementation
codex_agent/builtin_tools/        Built-in file/shell/vision/system/server tools
codex_agent/builtin_plugins/      Stateful memory/planner/scheduler/browser/desktop plugins
codex_agent/get_text/             Document extraction helpers
codex_agent/prompts/              Packaged system prompts
codex_agent/service.py            systemd/launchd user service helpers
codex_agent/tray.py               GTK tray controller
tests/                            Test suite
scripts/                          Source scripts mirrored into package data
pyproject.toml                    Package metadata and build config
MANIFEST.in                       Source distribution includes
```

## Testing

Run the full suite:

```bash
python -m pytest
```

Run lightweight static checks used during local cleanup:

```bash
python -m pyflakes codex_agent tests
```

The tests isolate `AGENT_RUNTIME_DIR` automatically, so they should not create or resume sessions from your real `~/.agent_runtime`.

The published `0.1.19` release was validated at:

```text
450 passed
```

## Packaging

Build source and wheel distributions with:

```bash
python -m pip install build twine
rm -rf build dist *.egg-info
python -m build
python -m twine check dist/*
```

The distribution includes prompt text files, `codex_agent/get_text/default_gitignore`, and the packaged Linux system dependency installer through package data and `MANIFEST.in`.

## Recent changes

- `0.1.19`: prune compacted raw session history after compaction, keep only the latest compaction anchor plus the new compaction and any unfinished remainder, and drop older redundant compaction summaries from active session files.
- `0.1.18`: make `start server/tray` detach by default, move durable memory fully into the memory plugin, keep wakeups as a core runtime primitive, expose RAG memory timestamps/sources, and move browser/desktop backend controllers into core modules with plugins as facades.
- `0.1.17`: reorganize the TUI and FastAPI server into dedicated `codex_agent.tui` and `codex_agent.server` packages, remove obsolete message payload fields, and clean unused imports.
- `0.1.16`: tolerate missing OpenAI API keys at startup by disabling OpenAI-dependent voice and memory features, and skip memory archiving when the memory plugin is unavailable.
- `0.1.15`: add the `AgentRuntime` interface for server/CLI/process adapters, split the CLI into a package with headless runtime commands, add `codex-agent config set` plus `PATCH /config`, and make built-in tools/providers/plugins configurable.
- `0.1.14`: add HookManager infrastructure, planner/scheduler robustness fixes, documented system dependencies, `codex-agent install-system-deps`, `codex-agent bootstrap`, and improved TUI SSE reconnect/replay handling.
- `0.1.13`: reorganize built-in tools into a `codex_agent.builtin_tools` package, keeping the public import surface compatible while separating file, shell, vision, system, and server-tool modules.
- `0.1.12`: add a persistent Playwright/Chromium browser controller with tab navigation, DOM/action snapshots, screenshots, form/click/key tools, and `browser_goto(url)` for active-tab navigation.
- `0.1.11`: split strict line-numbered UTF-8 `read` from unnumbered extracted `view`, preserve blank lines in read snippets, and show persistent+temporary message counts in the TUI status bar.
- `0.1.10`: persist only backend compaction summaries, drop bulky compacted conversation payloads, and refresh context status after compaction.
- `0.1.9`: maintenance packaging release after validating the local execution environment and deploy workflow.
- `0.1.8`: scope TUI replay/SSE catch-up to the active session and make bash/python subprocesses inherit the project Python environment, including service-launched agents.
- `0.1.7`: add durable RAG memory, scheduled wakeups, process-isolated server runtime, tray/service controls, robust SSE replay/reconnect, richer TUI status, and improved token estimates.
- `0.1.6`: add the FastAPI REST/SSE bridge, HTTP/SSE client, async-style agent mainloop, and decoupled TUI operation.

See [`CHANGELOG.md`](CHANGELOG.md) for the full history.

## Safety notes

This project is designed to let an AI assistant act on the local machine. That is powerful and potentially risky.

Recommended practices:

- Use a dedicated runtime directory for experiments.
- Review tool calls before enabling autonomous workflows.
- Avoid running the agent with elevated privileges.
- Keep secrets out of prompts, logs, and committed runtime files.
- Prefer temporary workfolders in tests and demos.
- Treat browser and desktop automation as real user actions.

## License

MIT. See [LICENSE](LICENSE).
