Metadata-Version: 2.4
Name: vibe-tester
Version: 0.1.0rc3
Summary: AI-driven UI automation testing framework with pluggable platform adapters.
Project-URL: Homepage, https://github.com/Haroldlei/vibe-tester
Project-URL: Repository, https://github.com/Haroldlei/vibe-tester
Project-URL: Issues, https://github.com/Haroldlei/vibe-tester/issues
Author: Vibe Testing Contributors
License: MIT
License-File: LICENSE
Keywords: agents,ai,bdd,behave,testing,uiautomation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Software Development :: Testing :: BDD
Requires-Python: >=3.11
Requires-Dist: behave>=1.2.6
Requires-Dist: jinja2>=3.1
Requires-Dist: pillow>=10.0
Requires-Dist: psutil>=5.9
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: typer>=0.9.0
Provides-Extra: all
Requires-Dist: opencv-python>=4.8; extra == 'all'
Requires-Dist: playwright>=1.42; extra == 'all'
Requires-Dist: pyperclip>=1.8; extra == 'all'
Requires-Dist: pywinauto>=0.6.8; extra == 'all'
Requires-Dist: scikit-image>=0.21; extra == 'all'
Requires-Dist: uiautomation>=2.0.20; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: macos
Provides-Extra: web
Requires-Dist: playwright>=1.42; extra == 'web'
Provides-Extra: windows-desktop
Requires-Dist: opencv-python>=4.8; extra == 'windows-desktop'
Requires-Dist: pyperclip>=1.8; extra == 'windows-desktop'
Requires-Dist: pywinauto>=0.6.8; extra == 'windows-desktop'
Requires-Dist: scikit-image>=0.21; extra == 'windows-desktop'
Requires-Dist: uiautomation>=2.0.20; extra == 'windows-desktop'
Description-Content-Type: text/markdown

# vibe-tester

> AI-driven UI automation testing for desktop and web apps —
> Cucumber-style tests, pluggable platform adapters, ships with the AI
> assets your coding agent needs to author and run them.

**Status:** alpha. Public API may change. The `windows-desktop` and
`web` adapters are implemented; the `macos` adapter is a stub.

---

## What it does

1. Lets you describe a UI test in **natural language** (in Copilot
   Chat, Claude CLI, Cursor, …) and generates a runnable Gherkin
   `.feature` file using **real element locators** from your project's
   element store.
2. **Executes** scenarios at any granularity (one feature, all of
   them, or a tag expression) and produces a Markdown report
   plus optional JSON output for the AI to parse.
3. **Walks your app interactively** with you to record UI element
   paths into a YAML store the executor can resolve.

The framework ships **AI assets** (agents, skills, an `AGENTS.md`
template) and a deterministic **CLI** (`vibe-tester`). It does not
embed an LLM and does not run an MCP server — your AI tool of choice
provides the intelligence, the CLI is the integration surface.

---

## Do I need an AI agent?

**No.** The framework is a Cucumber/behave runner with a UI-automation
adapter and a YAML element vocabulary — you can author and run tests
entirely by hand. The shipped agents are productivity multipliers, not
runtime dependencies.

| Capability                                                          | AI needed?            |
| ------------------------------------------------------------------- | --------------------- |
| Run tests (`vibe-tester run …`)                                    | No                    |
| Write `.feature` files by hand using `elements.yaml`                | No                    |
| Element collection — basic capture (`vibe-tester collect …`)       | No                    |
| Element collection — interactive *"navigate to the next page"* loop | Recommended (agent)   |
| Project customizations (`features/hooks/`, `features/steps/steps.py`) | No                  |
| Visual regression baselines + assertions                            | No                    |
| `@setup:` / `@clean:` tag-driven scenario isolation                 | No                    |
| Markdown / JSON reports                                             | No                    |
| Translating a natural-language request → `.feature`                 | **Yes** (Test Writer) |
| Structured root-cause analysis on a failed scenario                 | **Yes** (Test Debugger) |
| Auto-proposing `@clean:` tags + handler stubs from element `role:`  | **Yes** (Test Writer) |
| Detecting unmapped step phrases + scaffolding custom-step stubs     | **Yes** (Test Writer) |

Bottom line: **CLI + framework run standalone**. The agents add
natural-language authoring and structured failure triage. If you don't
have Copilot / Claude CLI / Cursor available, skip the `.github/agents/`
prompts and write `.feature` files directly — every step phrase the
runner accepts is documented in the `uia-assertions` and
`element-locators` skill files (also shipped to your project, plain
Markdown, readable without an LLM).

---

## Install

```powershell
# default — every adapter that ships today
pip install vibe-tester

# pick one (smaller install)
pip install vibe-tester[windows-desktop]

# pick several
pip install vibe-tester[windows-desktop,web]
```

| Extra              | Drives                                                   | Status      |
| ------------------ | -------------------------------------------------------- | ----------- |
| `windows-desktop`  | WinUI3 / Win32 / WPF / WebView2 / tray / shell menu      | Implemented |
| `web`              | Browser SUTs (Playwright)                                | Implemented |
| `macos`            | macOS-native SUTs                                        | Stub        |

---

## Quickstart

```powershell
# 1. Create a fresh test project (or scaffold into an existing folder)
mkdir my-app-tests
cd my-app-tests
vibe-tester init

# 2. Capture your SUT (interactive — your app should be running)
vibe-tester collect

# 3. Ask your AI agent (Copilot Chat / Claude CLI / …) to write a test:
#    "Write a smoke test that opens Settings and verifies the title."
#    The Test Writer agent uses elements.yaml + the framework's CLI.

# 4. Run it
vibe-tester run
```

**Project layout — one project = one SUT:**

After step 1 your project looks like:

```
my-app-tests/
├── AGENTS.md                  # AI instructions for this project
├── .github/
│   ├── agents/                # element-collector, test-writer, test-runner, test-debugger
│   └── skills/                # element-locators, uia-assertions, web-locators,
│                              # web-assertions, image-testing, custom-steps,
│                              # failure-diagnosis (adapter-relevant ones only)
└── features/
    ├── environment.py         # framework glue — do not edit
    └── steps/
        └── _framework.py      # framework glue — do not edit
```

After step 2 the element store is created at the project root:

```
my-app-tests/
├── elements.yaml              # the element vocabulary your tests use
├── features/
│   ├── *.feature              # Gherkin tests (the AI writes these)
│   ├── baselines/             # visual regression PNGs (optional)
│   ├── steps/
│   │   ├── _framework.py      # framework glue — do not edit
│   │   └── steps.py           # your custom step defs (optional)
│   └── hooks/                 # optional
│       ├── environment.py     # your before/after hooks
│       └── handlers.py        # @setup: / @clean: tag handlers
└── ...
```

The project root *is* the SUT — there's no nested per-app folder.

### Multiple SUTs (aggregation mode)

Some products span more than one surface — say an admin desktop tool
whose changes must show up in a sibling website. vibe-tester lets you
keep each surface as its own focused single-SUT project, then add an
**aggregation root** on top that orchestrates *integration* scenarios
across both. Layout:

```
my-product-tests/                ← aggregation root (NO elements.yaml here)
├── features/                    ← integration scenarios only
│   ├── environment.py           # framework glue — do not edit
│   ├── *.feature                # uses `on "<sut>"` per-step prefix
│   └── steps/
│       ├── _framework.py        # framework glue — do not edit
│       └── steps.py             # integration custom steps (optional)
├── admin-tool/                  ← child SUT #1 — full single-SUT layout
│   ├── elements.yaml
│   └── features/
│       └── ...
└── customer-site/               ← child SUT #2 — full single-SUT layout
    ├── elements.yaml
    └── features/
        └── ...
```

Mode is auto-detected when behave starts:

| Project root has…                                                    | Mode          |
| -------------------------------------------------------------------- | ------------- |
| `elements.yaml`                                                       | single SUT    |
| no root `elements.yaml`, but at least one child folder has one        | aggregation   |
| neither                                                               | uninitialized |

Integration scenarios use a per-step `on "<sut>"` prefix to name the
target SUT — the value matches the `app.name` declared inside that
child's `elements.yaml`, *not* the folder name:

```gherkin
Feature: Admin change shows up on the customer site

  Scenario: Editing a theme propagates within 5 seconds
    Given on "admin-tool" the app is open
    When  on "admin-tool" I click "themes.edit_button"
    And   on "admin-tool" I type "Sunset" into "themes.name_input"
    Then  on "customer-site" element "homepage.theme_banner" should be visible
```

The framework lazy-launches each SUT on first reference and shuts both
down once the run finishes. Five integration phrasings ship out of the
box (`the app is open`, `I click`, `I type … into`, `should exist`,
`should be visible`); for anything beyond that, write custom steps in
`features/steps/steps.py` and look up the active SUT via
`context.suts.get("<name>")`.

Running `vibe-tester run` from the **aggregation root** executes only
the integration features at that root. To run a single child SUT's own
tests in isolation, `cd` into that child and run there — each child is
itself a fully-functional single-SUT project.

`@setup:` / `@clean:` handlers and `@requires:` flag-based skips are
single-SUT only — there's no one "active adapter" to scope them to in
aggregation mode.

---

## CLI reference

| Command                                        | What it does                                                |
| ---------------------------------------------- | ----------------------------------------------------------- |
| `vibe-tester init [--target] [--adapter] [--overwrite] [--json]` | Scaffold a project from shipped assets        |
| `vibe-tester list adapters [--json]`          | Show installed adapters                                     |
| `vibe-tester list features [--json]`          | List `.feature` files                                       |
| `vibe-tester list elements [--details] [--json]` | Print the project's element vocabulary                  |
| `vibe-tester collect [--name] [--kind]`       | Interactive element capture                                 |
| `vibe-tester run [--feature\|--tag] [--scenario] [--json]` | Execute behave + emit Markdown / JSON report          |

All commands accept `--json` for machine-readable output (intended for
the AI agent to parse). Default output is human-friendly Rich tables
and Markdown reports under `./results/`.

---

## How the AI assets work

`vibe-tester init` drops four agents and the adapter-relevant skills
into `.github/` plus an `AGENTS.md` at the project root. Any AI
coding tool that follows the [AGENTS.md convention](https://agents.md) —
Copilot, Claude CLI, Cursor, etc. — will pick them up automatically.
Skills are filtered by the adapter(s) you scaffold: a `web`-only
project won't get `uia-assertions`, and a `windows-desktop`-only
project won't get `web-locators`.

Agents (one each):

| Agent              | Use when                                                 |
| ------------------ | -------------------------------------------------------- |
| Element Collector  | Adding the SUT or new pages to it                        |
| Test Writer        | Authoring `.feature` files from a natural-language ask   |
| Test Runner        | Executing tests and producing a Markdown report          |
| Test Debugger      | A test failed and you want a structured RCA              |

Skills:

| Skill              | Adapter          | Topic                                                  |
| ------------------ | ---------------- | ------------------------------------------------------ |
| element-locators   | windows-desktop  | UIA locator syntax, dot-notation, element store schema |
| uia-assertions     | windows-desktop  | All assertion types the Windows adapter supports       |
| web-locators       | web              | Playwright locator strategy and element store schema   |
| web-assertions     | web              | All assertion types the web adapter supports           |
| image-testing      | any              | Visual regression / baseline strategy                  |
| custom-steps       | any              | Authoring project-level custom Gherkin step definitions|
| failure-diagnosis  | any              | RCA methodology + known-issues catalog                 |

---

## Spec-first delegation: handing a task to a coding agent

Use this workflow when you want to delegate a feature to a coding
agent (Copilot, Claude CLI, Cursor, …) and have a `.feature` file
serve as the binding acceptance contract — written and approved
*before* coding starts, untouched while coding happens, and proven
green when you come back.

### What you get out of vibe-tester

vibe-tester is built around two files that, together, give you a
spec you can sign off on up front:

- A Gherkin **`.feature`** file — what the feature must do, in
  business language. References UI elements by **name only**
  (*"the Save Theme button"*), never by selector.
- An **`elements.yaml`** entry per referenced element — the locator
  the agent commits to creating (`AutomationId=btn_save_theme`,
  `data-testid=themes-save`, …).

Because the `.feature` file holds no locators, freezing it after
your approval does not constrain how the UI is built. Because every
locator the test will ever try to use is declared in `elements.yaml`
*before* code is written, the agent has no room to redefine "done"
later — the test will fail unless the built UI exposes exactly those
locators.

### The workflow, step by step

1. **You describe the task in plain English** to your AI agent.
2. **The agent drafts two files and shows them to you:**
   - `features/<feature>.feature` — the scenarios in semantic
     names.
   - New entries appended to `elements.yaml` — locator strings for
     every element the scenarios reference.
3. **You review and approve both files.** Edit the prose, add
   missing scenarios, rename anything that smells like
   implementation detail. Approve when it reads like the acceptance
   criteria you'd write yourself.
4. **The agent codes against the approved contract.** Product code,
   step glue, unit tests — but it does not edit the approved
   `.feature` file. Treat it as locked.
5. **The agent runs `vibe-tester run`.** A scenario passes only if
   the live UI exposes the locator declared in `elements.yaml`.
   Mismatches surface as test failures, not as silent edits.
6. **You come back to a Markdown report** under `./results/` and
   decide whether to ship.

If you want belt-and-suspenders enforcement, commit the approved
`.feature` and `elements.yaml` in their own PR and protect them with
a CI check that fails on any subsequent change to either file
without a `--rewrite-acceptance` reason recorded in the commit.

### How the agent picks locators before the UI exists

The instinct is to *discover* a locator by inspecting a built UI.
That forces the test to be written after coding, which destroys its
value as a prior commitment.

vibe-tester's workflow inverts this: the agent **declares** the
locator string in the same act that promises to render the element.
The locator file becomes a forward-looking contract — *"I will ship
a button whose `AutomationId` is `btn_save_theme`"* — not a recording
of what happened to be built. The implementation must satisfy the
contract, not the other way round.

This is reliable on every stack where the agent controls the source
of the locator string:

| Your stack                                      | Pre-commit a locator? | How                                                |
| ----------------------------------------------- | --------------------- | -------------------------------------------------- |
| Web (React / Vue / Svelte / plain HTML)         | Yes                   | Use a `data-testid` convention                     |
| WinUI 3 / WPF / UWP                             | Yes                   | Set `AutomationProperties.AutomationId` explicitly |
| Win32 / MFC                                     | Mostly                | Owned controls via control ID; wrap shell UI       |
| iOS / Android native                            | Yes                   | `accessibilityIdentifier` / `contentDescription`   |
| Closed-source 3rd-party widgets                 | Wrap first            | Locate the wrapper you control                     |
| Auto-generated framework IDs (e.g. Angular)     | Forbid                | Require an explicit testid via lint                |

### Starting from your project type

**Greenfield (the agent is also writing the app from scratch).**
The easiest case. Tell your agent in `AGENTS.md` to (a) adopt one
naming convention for locators (e.g. *every interactive element gets
`data-testid` shaped as `<feature>-<role>-<purpose>`*) and (b) add a
lint rule that fails the build on any interactive element missing
the attribute. From there, every feature PR appends to
`elements.yaml` before any code is written.

**Brownfield with an existing `elements.yaml`.** Point the agent at
the file and tell it to follow the existing convention for new
elements. The store itself is the reasoning input.

**Brownfield without an `elements.yaml` yet.** Run
`vibe-tester collect` once against the current build as a one-time
baseline. The agent then has both a snapshot of what exists and a
sample of the project's locator style to imitate. After that single
pass the project behaves like the case above.

### What to watch for

Three failure modes are worth naming up front:

1. **Locator typos.** The agent writes `data-testid="save-theme"`
   in `elements.yaml` but ships JSX with `save_theme` or no testid
   at all. The corresponding test scenario will fail on element
   lookup — which is the point — but you should treat that failure
   as *the agent broke its own contract*, not as a flaky test.
2. **Convention drift.** Across many features the agent invents
   slightly different naming schemes. Add a one-line CI check that
   greps `elements.yaml` for entries that don't match your
   convention regex; drift becomes a build failure rather than
   review burden.
3. **Semantic names that leak implementation.** *"the third div in
   the sidebar"* is a locator in disguise. Keep names role-based
   (*"Recently used themes list"*) so the spec stays implementation-
   agnostic and the agent retains room to build the UI well.

---

## Architecture (one paragraph)

A user project is one SUT with one **element store**
(`elements.yaml` at the project root). Its `app.kind` (e.g.
`windows-desktop`) tells the executor which **adapter** to use. The
CLI dispatches to that adapter for collect / launch / click /
screenshot operations; the **core** layer is adapter-agnostic and
never imports an adapter directly. New platforms plug in by adding a
sub-package under [vibe_tester/adapters/](vibe_tester/adapters).
Aggregation projects layer an **integration coordinator** on top —
multiple sibling single-SUT projects under a parent, integration
features at the parent driving them via an `on "<sut>"` per-step
prefix; child adapters are launched lazily and shut down together at
suite end. See
[doc/design/architecture.md](doc/design/architecture.md) for the full
picture.

---

## Contributing

This repo is the framework itself. See [AGENTS.md](AGENTS.md) for
dev-context guidance (rules, layout, common tasks). Bug reports and
PRs welcome at <https://github.com/Haroldlei/vibe-tester>.

License: MIT.
