Metadata-Version: 2.4
Name: capusqa
Version: 2.1.1
Summary: CapusQA: persona-driven LLM agent testing for macOS and web apps, served as a local MCP daemon
Project-URL: Homepage, https://github.com/DanielBirk04/capusqa
Author: Daniel Birk
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: llm-agents,macos,mcp,personas,playwright,testing,ui-testing
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: MacOS X
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.11
Requires-Dist: certifi>=2024.8.30
Requires-Dist: jinja2>=3.1
Requires-Dist: mcp>=1.10
Requires-Dist: pillow>=10.0
Requires-Dist: pyobjc-framework-applicationservices>=10.3; sys_platform == 'darwin'
Requires-Dist: pyobjc-framework-cocoa>=10.3; sys_platform == 'darwin'
Requires-Dist: pyobjc-framework-quartz>=10.3; sys_platform == 'darwin'
Requires-Dist: pyobjc-framework-vision>=10.3; sys_platform == 'darwin'
Requires-Dist: pyyaml>=6.0
Provides-Extra: browser
Requires-Dist: playwright>=1.45; extra == 'browser'
Provides-Extra: load
Requires-Dist: httpx>=0.27; extra == 'load'
Requires-Dist: psutil>=5.9; extra == 'load'
Provides-Extra: runner
Requires-Dist: anthropic>=0.40; extra == 'runner'
Requires-Dist: openai>=1.109; extra == 'runner'
Provides-Extra: vision
Requires-Dist: einops; extra == 'vision'
Requires-Dist: huggingface-hub>=0.23; extra == 'vision'
Requires-Dist: timm; extra == 'vision'
Requires-Dist: torch>=2.3; extra == 'vision'
Requires-Dist: transformers<4.50,>=4.45; extra == 'vision'
Requires-Dist: ultralytics>=8.2; extra == 'vision'
Provides-Extra: whole-system
Requires-Dist: httpx>=0.27; extra == 'whole-system'
Requires-Dist: psycopg[binary]>=3.1; extra == 'whole-system'
Requires-Dist: sqlalchemy>=2.0; extra == 'whole-system'
Description-Content-Type: text/markdown

# CapusQA

AI usability testing for real app workflows.

[![PyPI](https://img.shields.io/pypi/v/capusqa.svg)](https://pypi.org/project/capusqa/)
[![Python](https://img.shields.io/pypi/pyversions/capusqa.svg)](https://pypi.org/project/capusqa/)
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-blue.svg)](LICENSE)

[Package](https://pypi.org/project/capusqa/) | [MCP setup](client/mcp/CONNECT.md) | [Agent driver](client/mcp/DRIVER.md) | [Codex guide](client/codex/AGENTS.md) | [Examples](examples) | [Security](#security-and-privacy)

CapusQA lets Claude, Codex, Cursor, and other MCP-capable agents test local web
apps and native macOS apps like realistic users: run persona sessions, click
through workflows, file reproducible findings, and produce evidence bundles your
coding agent can use to fix and verify issues.

Runs locally on `127.0.0.1`. CapusQA stores artifacts, masks secrets, drives
browsers or macOS windows, and does not make hidden LLM calls.

[Start in 2 minutes](#quickstart) | [Recipes](#tutorials-and-recipes) | [Run the invoice demo](#try-the-demo) | [See the evidence bundle](#evidence-you-can-hand-to-a-coding-agent) | [Connect an agent](#connect-an-agent)

## Why CapusQA

Traditional UI tests prove that selectors still work. CapusQA looks for the
product failures scripted tests miss: dead controls, confusing flows, broken
business rules, inconsistent copy, accessibility friction, and crashes.

Use CapusQA when you want an agent to explore the app like a user, collect
evidence like a tester, and return findings a developer can reproduce.

Best for:

- Local web apps, prototypes, dashboards, and product workflows.
- MCP-driven testing with Claude, Codex, Cursor, or another coding agent.
- Evidence-heavy usability, workflow, and business-rule checks.
- Fast feedback before demos, releases, design reviews, and agent-assisted fix
  loops.

Not a replacement for:

- Unit tests, API tests, or deterministic browser regression suites.
- Production monitoring.
- Unsupervised testing against live production accounts.

## Guiding Principles

CapusQA is designed around a few constraints that make agent-driven UI testing
useful, reproducible, and safe to hand to a coding agent:

- **Local-first**: The daemon binds to `127.0.0.1` by default and stores run data
  on your machine.
- **Agent-native**: Any MCP-capable coding agent can drive the same daemon,
  dashboard, traces, and reports.
- **Evidence-first**: Findings are expected vs. observed behavior with
  screenshots, traces, oracle signals, and stable IDs.
- **Replayable**: Traces are first-class artifacts so fixes can be checked
  against the workflow that found the issue.
- **No hidden reasoning**: The daemon observes and acts. Your agent, or the
  optional runner, does the reasoning.

## Quickstart

Install CapusQA with browser support:

```bash
uv tool install --python 3.12 'capusqa[browser]'
capusqa setup
```

Or the one-liner, which installs `uv` if needed and runs setup:

```bash
curl -fsSL https://raw.githubusercontent.com/DanielBirk04/capusqa/main/scripts/install.sh | sh
```

If you do not have `uv` yet:

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool update-shell
```

### Windows

CapusQA runs the web/URL testing path on Windows (native macOS-app testing is, by
nature, macOS-only — its dependencies are skipped automatically). In
**PowerShell**:

```powershell
powershell -ExecutionPolicy Bypass -c "irm https://astral.sh/uv/install.ps1 | iex"
uv tool install --python 3.12 'capusqa[browser]'
capusqa setup
```

Or the one-liner, which installs `uv` if needed and runs setup:

```powershell
powershell -ExecutionPolicy Bypass -c "irm https://raw.githubusercontent.com/DanielBirk04/capusqa/main/scripts/install.ps1 | iex"
```

Open a new terminal if `capusqa` is not found after installation.

`capusqa setup` prepares browser support, can wire supported MCP clients, and
normally starts the local daemon. To start it later:

```bash
capusqa serve --open
```

Dashboard:

```text
http://127.0.0.1:7777/
```

MCP endpoint:

```text
http://127.0.0.1:7777/mcp
```

Useful commands:

```bash
capusqa doctor                 # Check local setup.
capusqa capacity               # Estimate local browser capacity.
capusqa issues                 # List stored findings.
capusqa report RUN_ID          # Write report.html, report.md, feedback.json.
capusqa agents --run-id RUN_ID # Play queued sessions; needs Codex, Claude Code, OPENAI_API_KEY, or ANTHROPIC_API_KEY.
```

## Tutorials And Recipes

Pick the path that matches what you are trying to do:

| Goal | Start here |
| --- | --- |
| Run CapusQA for the first time | [Quickstart](#quickstart) |
| Prove the browser pipeline works | [Try the invoice demo](#try-the-demo) |
| Connect Claude, Codex, Cursor, Cline, Windsurf, VS Code, or Zed | [client/mcp/CONNECT.md](client/mcp/CONNECT.md) |
| Teach any MCP agent how to drive CapusQA | [client/mcp/DRIVER.md](client/mcp/DRIVER.md) |
| Use CapusQA from Codex | [client/codex/AGENTS.md](client/codex/AGENTS.md) |
| Test a local web app | Start CapusQA, then point a run at `http://127.0.0.1:<port>` or a `file://` URL |
| Test a native macOS app | Read [Targets](#targets), install the `vision` extra, and run `capusqa doctor --request` |
| Hand findings to a coding agent | Generate [the evidence bundle](#evidence-you-can-hand-to-a-coding-agent) |

Common first prompts:

```text
Use CapusQA to test my local app at http://127.0.0.1:3000. Act as realistic users,
report reproducible findings, and generate the CapusQA report artifacts.
```

```text
Use CapusQA to run the invoice demo in examples/invoice_web with the scenario pack
at examples/invoice_web/spec.yaml. Report every planted bug with evidence.
```

## Try the Demo

The bundled invoice app is a fast end-to-end proof: CapusQA should find four
planted product bugs and generate report artifacts for the run.

Clone the repository to use the demo files:

```bash
git clone https://github.com/DanielBirk04/capusqa.git
cd capusqa
```

Demo files:

- app: [examples/invoice_web/index.html](examples/invoice_web/index.html)
- scenario pack: [examples/invoice_web/spec.yaml](examples/invoice_web/spec.yaml)

Planted bugs:

- `Export PDF` does nothing.
- The promised 10 percent discount is never applied.
- Sending an invoice confirms with the wrong message.
- Invalid amounts are silently ignored.

Print a copy-pasteable `file://` URL for the dashboard:

```bash
python3 -c 'from pathlib import Path; print(Path("examples/invoice_web/index.html").resolve().as_uri())'
```

Or ask a connected agent:

```text
Use CapusQA to test examples/invoice_web/index.html with the scenario pack in
examples/invoice_web/spec.yaml. Report the findings and generate the CapusQA
report artifacts.
```

Source checkout only:

```bash
capusqa dev test-run --out /tmp/capusqa-invoice-web
```

A useful run should produce findings for dead controls, rule violations,
inconsistent confirmation copy, and missing validation.

## Evidence You Can Hand To A Coding Agent

Every run can produce a fix-ready evidence bundle: screenshots, traces,
findings, expected vs. observed behavior, and machine-readable `feedback.json`
for follow-up automation.

Default storage:

```text
~/.capusqa/
  capusqa.db
  artifacts/<run-id>/
    report.html
    report.md
    feedback.json
    screenshots
    traces
```

Core artifacts:

| Artifact | Use it for |
| --- | --- |
| `report.html` | Review screenshots, sessions, findings, and evidence in a browser. |
| `report.md` | Share a compact developer report. |
| `feedback.json` | Feed stable finding IDs, repro steps, expected vs. observed behavior, evidence, and status to a coding agent. |
| Traces | Replay action histories and verify fixes. |

Example finding shape:

```json
{
  "id": "CAP-001",
  "kind": "rule-violation",
  "title": "Volume discount is not applied above 100 EUR",
  "expected": "Subtotal above 100 EUR applies a 10 percent discount",
  "observed": "Subtotal and total remain identical after adding qualifying items",
  "evidence": ["screenshots", "repro_trace"]
}
```

Set `CAPUSQA_DATA_DIR` or pass `--data-dir` to store data somewhere else.

## Connect an Agent

CapusQA is built for MCP clients. Point your agent at:

```text
http://127.0.0.1:7777/mcp
```

Agent-specific guides:

- [client/mcp/CONNECT.md](client/mcp/CONNECT.md) - connect MCP clients to CapusQA.
- [client/mcp/DRIVER.md](client/mcp/DRIVER.md) - portable tester playbook for
  any MCP client.
- [client/codex/AGENTS.md](client/codex/AGENTS.md) - Codex driver guide.

Claude Code and Codex users can run `capusqa setup` to register the same local MCP
server. Claude Code also gets the optional `/capusqa` command menu; the main loop
there is `/capusqa:test`, `/capusqa:runs`, and `/capusqa:issues`.

## Targets

| Target | Use it for | Setup |
| --- | --- | --- |
| Web URL or `file://` | Local web apps, demos, parallel runs, CI-style checks | `capusqa[browser]`; no Screen Recording or Accessibility permissions |
| Native macOS app | Desktop workflows, AppKit/Cocoa targets, real-window testing | Advanced path; requires Screen Recording and Accessibility permissions |

Browser targets run in isolated Chromium contexts. Native targets use window
screenshots, OCR/vision perception, and synthesized mouse and keyboard input.

For native macOS targets:

```bash
uv tool install --force --python 3.12 'capusqa[browser,vision]'
capusqa models download
capusqa doctor --request
export CAPUSQA_MACOS_EXPERIMENTAL=1
capusqa serve --open
```

Keep the machine free during native runs. Browser runs do not contend with your
mouse.

## How It Works

```text
persona goals or scenario specs
        |
        v
MCP client or optional capusqa agents runner
        |
        v
CapusQA daemon on 127.0.0.1
        |
        +-- browser driver: isolated Chromium sessions
        +-- macOS driver: native window screenshots and input
        |
        v
dashboard, SQLite store, reports, feedback.json, replayable traces
```

The core loop is:

```text
run_create -> task_claim -> session_start
           -> observe -> click/type/scroll/press/wait
           -> finding_report / checkpoint_mark / rule_mark
           -> session_end -> report_generate
```

The split is deliberate:

- The client decides what a persona should try and how to interpret evidence.
- The daemon observes, actuates, stores, masks secrets, reports, and replays.

## Examples

- [examples/invoice_web](examples/invoice_web) - self-contained browser demo
  with planted bugs and a scenario pack.
- [examples/invoice_mini](examples/invoice_mini) - native Cocoa invoice demo
  with matching product rules.
- [examples/collab_board](examples/collab_board) - multi-user collaboration
  fixture.
- [examples/saas_mini](examples/saas_mini) - small SaaS-style target.

## Security and Privacy

CapusQA runs locally and binds to `127.0.0.1` by default. The dashboard and MCP
server assume a localhost trust boundary.

Set `CAPUSQA_DASHBOARD_TOKEN` before exposing the dashboard beyond localhost.
Mutating dashboard routes and sensitive reads honor it as a Bearer token when
the token is set.

Credentials for test accounts live in a local SQLite vault. Fields whose names
look secret, such as `password`, `secret`, `token`, `pin`, `key`, `otp`, or
`code`, are masked in traces and reports as `{{secret:...}}`. Replay resolves
them locally.

Use dedicated test accounts. Do not point CapusQA at production systems unless
you have explicitly designed the run, data, and account permissions for that
risk.

Generated reports and traces may contain app content. Attach only sanitized
artifacts to public issues.

CapusQA Intelligence and CapusQA Atlas are optional retrieval and hosted-assistance
features. They are off by default and require explicit environment
configuration plus local consent:

```bash
capusqa intelligence status
capusqa intelligence accept
capusqa intelligence export
capusqa intelligence withdraw
```

## Development

From a source checkout:

```bash
uv venv --python 3.12 .venv
uv pip install --python .venv/bin/python -e '.[browser]'
.venv/bin/playwright install chromium
.venv/bin/capusqa doctor
.venv/bin/capusqa serve --open
```

Repository map:

| Path | Purpose |
| --- | --- |
| `capusqad/` | Python daemon, MCP server, drivers, dashboard server, reports, and CLI. |
| `client/` | MCP prompts, connection guides, Codex guide, and Claude Code plugin assets. |
| `examples/` | Demo apps and scenario packs. |
| `scripts/install.sh` | Source-checkout installer and setup helper. |
| `pyproject.toml` | Package metadata, dependencies, extras, and build configuration. |

## Contributing

Keep contributions evidence-oriented:

- Bug reports should include the target app, CapusQA version, install method,
  relevant run ID, logs or report artifacts, and expected vs. observed behavior.
- Pull requests should include the smallest useful change plus the focused check
  or demo command that covers it.
- Security-sensitive issues should not include live credentials, production
  data, or unredacted reports.

## License

Apache-2.0. OmniParser v2 icon-detector weights are AGPL-3.0; review their
license before redistributing a package or service that includes those weights.
