Metadata-Version: 2.4
Name: ai-nd-co-agent-tools
Version: 0.5.1
Summary: Codex/Claude-backed text transformation and Kokoro TTS command-line tools.
Project-URL: Homepage, https://github.com/ai-nd-co/agent-tools
Project-URL: Repository, https://github.com/ai-nd-co/agent-tools
Project-URL: Issues, https://github.com/ai-nd-co/agent-tools/issues
Author: ai-nd-co
License: Apache-2.0
License-File: LICENSE
Keywords: cli,codex,kokoro,speech,tts
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: httpx>=0.28.1
Requires-Dist: kokoro>=0.9.4
Requires-Dist: numpy>=1.26.0
Provides-Extra: dev
Requires-Dist: mypy>=1.11.0; extra == 'dev'
Requires-Dist: pytest>=8.3.0; extra == 'dev'
Requires-Dist: ruff>=0.6.9; extra == 'dev'
Provides-Extra: ui
Requires-Dist: pyside6>=6.8.0; extra == 'ui'
Description-Content-Type: text/markdown

# agent-tools

Python CLI tools for:

- transforming raw text into TTS-ready narration and synthesizing it in one command
- transforming piped text through either the private Codex backend used by local Codex or an experimental Claude Code CLI wrapper
- synthesizing the result to WAV with Kokoro-82M
- auto-TTS of completed Codex and Claude Code replies through installed desktop hooks

This repo is intentionally wired to local Codex and Claude Code installs.

Release policy:

- semantic-release owns version bumps, changelog updates, and `py-v*` tags
- do not manually edit `project.version` in `pyproject.toml` during normal work
- do not create release tags by hand unless the release workflow explicitly calls for it

## Status

This is an **experimental public package** with a **private Codex dependency** and an
**experimental Claude Code CLI transform path**.

The `transform` command mirrors the current request shape used by the local Codex source tree and depends on ChatGPT-backed auth in `~/.codex/auth.json`.

The Claude Code path uses the official `claude` CLI in headless `-p` mode with a constrained
single-turn wrapper. It does not rely on any private Claude Code API.

It does **not** use:

- `codex exec`
- `codex app-server`
- the public OpenAI API key flow

That means:

- you must already be logged into local Codex
- backend compatibility can break if Codex internals or backend contracts change
- this package is best suited for users who already use local Codex

## Requirements

- Python 3.11+
- local Codex already logged in via ChatGPT
- local Claude Code CLI installed if you want Claude-backed transforms or Claude auto-TTS integration
- `espeak-ng` installed for best Kokoro English fallback behavior

## Install

```bash
cd repos/agent-tools
uv venv
uv pip install -e ".[dev]"
```

Public package install:

```bash
pip install ai-nd-co-agent-tools
```

UI-enabled install:

```bash
pip install "ai-nd-co-agent-tools[ui]"
```

Install a CUDA-enabled PyTorch stack for this CLI environment:

```bash
agent-tools install-cuda
```

Pass an explicit track if you do not want auto-detection:

```bash
agent-tools install-cuda --cuda-track cu130
```

## Usage

### Single-command path: `ttsify`

```bash
echo "Turn this note into natural spoken narration." | agent-tools ttsify --output-file out.wav
```

`ttsify` uses a built-in rewrite prompt stored in the package and then pipes the transformed text
into Kokoro TTS.

Default `ttsify` settings:

- model: `gpt-5.4-mini`
- voice: `af_heart`

Configurable via env vars:

```bash
AGENT_TOOLS_CODEX_MODEL=gpt-5.4-mini
AGENT_TOOLS_CODEX_REASONING_EFFORT=medium
AGENT_TOOLS_KOKORO_VOICE=af_heart
AGENT_TOOLS_KOKORO_LANGUAGE=a
AGENT_TOOLS_KOKORO_SPEED=1.0
AGENT_TOOLS_KOKORO_DEVICE=auto
AGENT_TOOLS_TRANSFORM_PROVIDER=codex
AGENT_TOOLS_CLAUDE_CODE_MODEL=haiku
AGENT_TOOLS_CLAUDE_CODE_EFFORT=low
AGENT_TOOLS_CLAUDE_CODE_BARE=false
```

Claude Code transform models are intentionally limited to:

- `haiku`
- `sonnet`

`opus` is rejected by the CLI and runtime wrapper.

CLI flags override env vars.

Queue for playback on Windows:

```bash
echo "Turn this note into natural spoken narration." | agent-tools ttsify --output-mode play --source agent-a
```

### Desktop integrations

Install both supported desktop integrations:

```bash
agent-tools install-integrations
```

Install only one provider if needed:

```bash
agent-tools install-codex-integration
agent-tools install-claude-integration
```

- On native Windows Codex, this installs a `notify` command in `~/.codex/config.toml`.
- On non-Windows, this keeps the Stop-hook integration path.
- Claude Code integration installs an AgentTools `Stop` hook into `~/.claude/settings.json`
  and writes the hook script to `~/.claude/agent-tools/stop_tts.sh`.
- The compatibility alias `agent-tools install-codex-stop-hook` remains available.

Windows debug logs:

- `~/.codex/notify_tts.log`
- `~/.codex/notify_tts_agent_tools.log`

On Windows, Codex passes the notify payload as the final JSON argv argument to the installed
Python command. No PowerShell or bash wrapper is used.

This enqueues the generated audio, starts the background controller if needed, and returns
immediately.

Claude Code hook logs:

- `~/.claude/agent-tools/stop_tts.log`
- `~/.claude/agent-tools/stop_tts_agent_tools.log`

You can also manage AgentTools auto-TTS integration from the desktop controller UI and tray menu.
AgentTools only needs **one** backend to be available:

- Codex, with local ChatGPT login working
- or Claude Code installed on PATH

If neither backend is available, the UI shows an info-only message telling you to install or sign
in to any one of them first.

### Transform text

```bash
echo "Rewrite this into short spoken narration." | agent-tools transform \
  --system-prompt-file prompt_examples/rewrite_for_tts.md
```

Optional controls:

```bash
echo "Input text" | agent-tools transform \
  --system-prompt-file prompt_examples/rewrite_for_tts.md \
  --provider codex \
  --model gpt-5 \
  --reasoning-effort medium \
  --fast
```

Experimental Claude Code-backed transform:

```bash
echo "Input text" | agent-tools transform \
  --system-prompt-file prompt_examples/rewrite_for_tts.md \
  --provider claude-code \
  --claude-model haiku \
  --claude-effort low
```

What the Claude wrapper does:

- runs `claude -p` in a temporary minimal working directory
- forces `--max-turns 1`, `--tools ""`, `--no-session-persistence`, and `--output-format json`
- uses `--system-prompt` with the same rewrite prompt file you pass to `transform`
- does **not** require or use any unofficial Claude Code API

Important limitation:

- `--claude-bare` is supported, but it is **off by default** because local Claude help shows bare
  mode only reads `ANTHROPIC_API_KEY` or `apiKeyHelper` auth. If you rely on normal Claude login
  state, keep `--claude-bare` off.

### Text to speech

```bash
echo "Hello world." | agent-tools tts --output-file hello.wav
```

Queue already-prepared speech on Windows:

```bash
echo "Hello world." | agent-tools tts --output-mode play --source agent-a
```

### Desktop controller UI

```bash
agent-tools ui
```

If the controller is already running, this focuses the existing window instead of starting a
second process.

The controller behavior is:

- when neither Codex nor Claude Code is available, the normal playback UI is hidden and the window
  shows an info-only message telling you to install or sign in to any one backend first
- when either backend is available, the normal playback UI stays usable
- a single switch soft-disables or re-enables AgentTools auto-TTS processing when the relevant
  AgentTools hook/integration is installed
- a dropdown chooses the default transform engine used for `ttsify` and desktop auto-TTS:
  Codex or Claude Code
- if the saved/default provider is unavailable but another backend is available, AgentTools falls
  back automatically unless you explicitly force a provider on the CLI

### End-to-end pipeline

```bash
cat input.txt | agent-tools transform \
  --system-prompt-file prompt_examples/rewrite_for_tts.md \
  | agent-tools tts --voice af_heart --output-file out.wav
```

## Notes

- `ttsify` is the recommended end-user path; `transform` and `tts` remain available as building blocks.
- `transform` reads stdin by default and writes plain text to stdout.
- `tts` reads stdin by default and writes WAV bytes to stdout unless `--output-file` is set.
- `tts` and `ttsify` support `--output-mode play` on Windows.
- in play mode, audio is queued into a single background controller process.
- `agent-tools ui` launches or focuses the popup/tray controller window.
- controller shortcuts: `Space` pause/resume, `Esc` stop, `Ctrl+R` replay, `Ctrl+N` next.
- `tts` and `ttsify` default to `--device auto`.
- auto device selection uses a real CUDA probe, not just `torch.cuda.is_available()`.
- `agent-tools install-cuda` reinstalls the full PyTorch stack (`torch`, `torchvision`, `torchaudio`) into the current Python environment and validates the full Kokoro import chain in a fresh subprocess by default.
- `transform` refreshes ChatGPT tokens when the Codex backend returns `401`.
- the experimental Claude transform path uses the official `claude` CLI only; it is intentionally
  limited to one turn with tools disabled
- Native Windows Codex uses `notify`; `hooks.json` lifecycle hooks are not used there.
- semantic-release now owns future Python package version bumps and `py-v*` tags.

## CPU performance

Measured on this machine on **April 15, 2026** with **forced CPU**:

| Scenario | Wall time | Audio time | Real-time factor |
|---|---:|---:|---:|
| first-ever cold init after dependency/model setup | ~43.1s | n/a | n/a |
| cached init | ~2.9s | n/a | n/a |
| warm short | 0.309s | 4.80s | 0.064 |
| warm medium | 1.199s | 15.53s | 0.077 |
| warm long | 2.514s | 26.70s | 0.094 |

Interpretation:

- warm CPU generation on this machine is about **10x-15x faster than realtime**
- the main cost is **cold startup/model load**, not steady-state synthesis

To reproduce locally:

```bash
python scripts/benchmark_tts_cpu.py
```

## Troubleshooting

- Missing `~/.codex/auth.json`: run `codex login`
- Expired auth: rerun `codex login` if refresh fails permanently
- Missing `espeak-ng`: install it for better English fallback behavior
- Slow first run: expected; Kokoro downloads voices/models and initializes the pipeline
- After changing Python versions for the interpreter that runs `agent-tools`, rerun `agent-tools install-cuda` in that same interpreter to repair the PyTorch stack for Kokoro
