Metadata-Version: 2.4
Name: tepyd
Version: 0.6.1
Summary: TEst PYramid Doctor — diagnose a project's test pyramid: mass, structure, and coverage.
Author: Stefane Fermigier
Author-email: Stefane Fermigier <sf@abilian.com>
License-Expression: Apache-2.0
License-File: LICENSE
Requires-Python: >=3.12
Description-Content-Type: text/markdown

# Tepyd — The TEst PYramid Doctor

*Diagnose your test pyramid: is the shape what you say you want?*

Tepyd looks at a project's test suite and tells you whether its *shape* matches the test pyramid you say you want: a broad base of cheap unit tests, fewer integration tests, a thin cap of end-to-end tests. It automates the checks you'd otherwise do by hand — which packages are under- or over-tested, where the cheap tests are missing, whether the test tree mirrors the source tree, and — by running your suite under coverage — which tier actually exercises each package.

It's configuration-driven: point it at any project, describe that project's layout once in `pyproject.toml`, and run one command. The author's own layout ships as the default, so for projects that share it there's nothing to configure.

> Think `tepyd doctor`: diagnose my pyramid.

## The lenses

Tepyd looks at a suite through several complementary lenses. Each is useful alone; together they catch failure modes the others miss.

| Lens | Command | Question | Runs tests? | Status |
|------|---------|----------|-------------|--------|
| **Mass** | `tepyd mass` | How much test code is there, and what shape does it make? | no | ✅ implemented |
| **Mirror** | `tepyd mirror` | Does the test tree structurally parallel the source tree? | no | ✅ implemented |
| **Cover** | `tepyd cover` | Which tier actually *executes* each unit — and is it the cheap one? | yes | ✅ implemented |
| **Report** | `tepyd report` | All the checks at once, plus advice: the *why* and the *how*, not just the *what*. | no | ✅ implemented |
| **Audit** | `tepyd audit` | Does each test file's imports match the tier it lives in? | no | 🔜 planned |

## Requirements

- Python ≥ 3.12.
- `mass`, `mirror`, and `report` have no runtime dependencies — the line counter is built in.
- `cover` additionally needs the analysed project's own `pytest` and `coverage` to be importable, so run it from that project's environment (see [`tepyd cover`](#tepyd-cover)).
- `cloc` is an optional opt-in for the line counter (`counter = "cloc"`).

## Install

```bash
uv sync                                 # for development in this repo
# once published to PyPI (not yet):
# uv tool install tepyd
```

After `uv sync`, prefix commands with `uv run` (or activate the venv). Once published, `uv tool install tepyd` puts `tepyd` on your PATH directly.

The default line counter is built in — no external dependency. Set `counter = "cloc"` in config to use the [`cloc`](https://github.com/AlDanial/cloc) binary instead, if you want its stricter, multi-language counting.

## Quick start

```bash
uv run tepyd init                      # detect this project's layout, write a config
uv run tepyd mass --min-src 1          # analyse the current directory
uv run tepyd -C /path/to/project mass  # analyse another project
uv run tepyd report                    # the checks + advice, in one read
```

(`--min-src` skips source units under 20 LOC by default; pass `--min-src 1` on small projects to see every unit. Drop the `uv run` prefix once Tepyd is on your PATH.)

`mass`, `mirror`, and `cover` take `--json` for machine-readable output — the contract `report` and a future CI gate build on (`report` itself renders text or Markdown via `--format`). All commands take `-C/--root DIR` to point at a project other than the current directory.

## Concepts

- **Source unit** — a meaningful slice of the source tree that Tepyd analyses as a whole, derived from configurable glob patterns (by default: each package under `modules/`, plus every other top-level package).
- **Tier** — one rung of the pyramid: a directory of tests of a given cost (e.g. `tests/a_unit`). Tiers are listed cheapest-first in config; any number is allowed, not just three.
- **Unit share** — the fraction of a unit's test code that lives in the cheapest tier. The headline pyramid-health metric.
- **Shape glyph** — a one-character read on a unit's pyramid:

  | Glyph | Meaning |
  |-------|---------|
  | `▲` | healthy — the cheapest tier is the largest *and* ≥ 40 % of test LOC |
  | `◇` | balanced — neither clearly healthy nor inverted |
  | `▼` | inverted — the most expensive tier is the largest *and* unit share < 30 % |
  | `·` | no tests at all |

  When a tier declares an `expects` scope (see [`tepyd mirror`](#tepyd-mirror)), the shape becomes **layer-aware**: a unit is judged against its *expected* home tier, not the unit ideal. A controller whose home is e2e is `▲` when its tests live at e2e — `▼` only marks test mass that sits *above* where it belongs.

## `tepyd mass`

Counts test LOC against source LOC for every source unit and reports the per-tier breakdown, ratios, and pyramid shape.

```bash
tepyd mass
tepyd mass --json
tepyd mass --min-src 50      # skip source units under 50 LOC (default: 20)
tepyd mass --exclude faker   # skip a unit, on top of config exclusions (repeatable)
```

```
package       src  unit  integration  http-e2e  browser  tests  ratio    u%
------------  ---  ----  -----------  --------  -------  -----  -----  ----  -
modules/biz    24    32            8         4        0     44  1.83x   73%  ▲
modules/wire   24     0            0         0       24     24  1.00x    0%  ▼
services       16     0           16         0        0     16  1.00x    0%  ◇
models         16     0            0         0        0      0  0.00x     —  ·

=== Summary ===
...                            (per-tier totals and outlier lists)
Tier mix across the codebase : unit 50%  /  integration 20%  /  http-e2e 10%  /  browser 20%
  ⚠ unit share is 50%, below the 60% target — the pyramid is flattened.
Caveat : tier is by directory, not by what each test actually exercises.
```

**How to read it.** Each tier has its own column; `ratio` is total test LOC ÷ source LOC; `u%` is unit share. The summary lists outliers (untested, under-tested below 0.5×, heavily-tested above 2×), flags inverted pyramids, and warns when the codebase-wide unit share falls below the first tier's `target_share`.

**Layer-aware judging.** If you've scoped your tiers with [`expects`](#tepyd-mirror), mass respects it: a unit whose home is e2e (a web controller) isn't flagged `▼` for being e2e-heavy, and the codebase `target_share` check sets aside each unit's tests at its *expected* higher tiers before measuring the unit share — so a large, legitimate e2e or integration surface doesn't read as a flattened pyramid. Only test mass sitting *above* where it belongs counts against you. Without `expects`, nothing changes — every unit is judged against the classic unit-pyramid.

**A caveat Tepyd states up front:** LOC is a *proxy* for effort, not a measure of quality, and a test's tier is decided by its directory, not by what it actually exercises. Mass tells you where to look; the [`cover` lens](#tepyd-cover) tells you whether the tests are real.

## `tepyd mirror`

Static, no-execution comparison of the test tree against the source tree, per tier, at the **same granularity as your `units`** — the slices `mass` and `cover` use. A package unit's mirrored test directory is checked recursively, so a test for any of its sub-packages counts. Refine the `units` patterns (e.g. `["modules/*", "*"]`) to make mirror coarser or finer; it never floods a deeply-nested project with per-sub-package gaps.

```bash
tepyd mirror
tepyd mirror --json
tepyd mirror --exclude faker
```

```
Mirror — test tree vs source tree (6 source units)

unit: 3/6 mirrored (50%)
  gap     domain/models
  gap     web
  orphan  tests/a_unit/legacy

integration: 1/6 mirrored (17%)
  gap     domain
  ...
```

**How to read it.** For each tier:

- **present** — the source package has a matching test directory that contains tests (counted in the `X/Y mirrored` figure),
- **gap** — a source package with no test counterpart at this tier *that the tier was expected to test*,
- **out of scope** — a source package the tier isn't responsible for, so its absence is reported as n/a, not a gap (shown as a count; the full list is in `--json`),
- **orphan** — a test directory that contains tests but has no source on disk (tests for code that moved or vanished).

**Scoping a tier to its layer.** By default every tier is checked against every source package — fine for a flat app, noisy for a layered one. In a hexagonal/onion architecture each layer has a *natural* tier: pure domain is unit-tested, the persistence layer integration-tested, the HTTP edge end-to-end-tested. Checking all three against every package turns real structure into a wall of "gaps" that are correct by design:

```
unit: 4/10 mirrored (40%)
  gap     di
  gap     domain/ports
  gap     infrastructure
  gap     repositories
  gap     web
  gap     web/controllers
```

…and the integration and e2e tiers each report eight more gaps in the same vein — code that simply lives at a different tier. Tell Tepyd which packages each tier owns with `expects` (glob patterns over unit names):

```toml
[[tool.tepyd.tiers]]
name = "a_unit"
expects = ["domain", "domain/*", "services", "lib"]   # pure logic

[[tool.tepyd.tiers]]
name = "b_integration"
expects = ["repositories", "infrastructure"]          # the DB-bound layer

[[tool.tepyd.tiers]]
name = "c_e2e"
expects = ["web", "web/*"]                            # the HTTP edge
```

Now a package outside a tier's scope is reported as *out of scope* (n/a), not a gap, and drops out of that tier's `X/Y mirrored` figure — leaving only the gaps that are real:

```
unit: 4/4 mirrored (100%), 4 out of scope
integration: 2/2 mirrored (100%), 6 out of scope
e2e: 2/2 mirrored (100%), 6 out of scope
```

Two things scope deliberately doesn't do. A package that should never be tested *anywhere* — a pure `Protocol`/ports layer with no runtime behaviour — belongs in `[tool.tepyd.exclude]` (with a reason), which drops it from every tier; that's why `domain/ports` and `di` are gone from the counts above. And tests that *do* exist always count as present, even at a tier that didn't expect them: scope governs whether an *absence* is a gap, never whether existing tests count.

Orphan detection checks whether the source actually exists, so it's independent of exclusions. Mirror coverage is presented as data, not a pass/fail — a browser tier showing `1/20 mirrored` is often by design.

## `tepyd cover`

The only lens that **runs your test suite**. For each tier it does one `coverage run -m pytest <tier-dir>` (the tiers partition the suite, so the cost is roughly one full run), then measures, per source unit, what fraction of its statements each tier actually executes.

> **Run `cover` from your project's own environment.** Unlike the other lenses, it imports and executes your code, so it must run where your package, `pytest`, and `coverage` are installed. Add Tepyd there (`uv add --dev tepyd`) and run `uv run tepyd cover` from the project root. Installing Tepyd standalone (`uv tool install`) and pointing it at the project with `-C` will fail to import your tests.

```bash
tepyd cover                  # all tiers
tepyd cover --tier a_unit    # just the unit tier (repeatable)
tepyd cover --json
```

```
unit      stmts  unit  e2e   any
--------  -----  ----  ---  ----
checkout      7    0%  88%   88%  v hidden
domain        9  100%   0%  100%
```

The **any** column is the union across tiers — a unit's true reachable-by-tests coverage. The flag marks the **hidden inverted pyramid**: a unit that's well covered overall (`any` high) but barely by its unit tier — the lines run, but only the expensive tiers run them. A global coverage report would show both rows as green; only this lens reveals that `checkout`'s coverage is entirely e2e.

Requirements & behavior:
- It runs your suite under the **same interpreter that runs tepyd**, so that interpreter must have your project's dependencies (and `pytest`/`coverage`). Run it from your project's environment — `uv run tepyd cover`, or `uv run --with[-editable] <tepyd> tepyd cover` if tepyd lives elsewhere. If your project's code can't be imported (every tier fails in `conftest`/collection), cover says so, names the interpreter and the missing module, and points you at the fix — rather than printing a wall of zeros.
- It prints **per-tier progress** to stderr as it runs (it executes the whole suite once per tier, so a large suite takes a while — the progress lines tell you it's working, not hung).
- It **ignores the project's own `[tool.coverage]` config** so the numbers don't depend on it, and attributes coverage by resolved path (robust to multi-file units and absolute coverage paths).
- A tier that **fails to run** is shown as a 0% column and listed as not-measured (distinct from "0% because untested"); a tier whose tests **ran but failed** is used with a warning that the numbers are a floor. Both also print to stderr.
- A tier that runs but measures **0% everywhere** — typically a **browser/Playwright** suite that drives your app in a separate process `coverage.py` can't see — is flagged as *not measured*, not *covers nothing* (and listed in `--json` as `blind_tiers`). Scope it out with `--tier`, or measure it under subprocess coverage.

## `tepyd report`

Runs every check at once and synthesises the results into a list of **findings**, each carrying not just *what* is wrong but *why* it matters (the pyramid principle behind it) and *how* to fix it.

```bash
tepyd report                     # console report, "senior" level
tepyd report --format md         # Markdown, for a PR comment or a committed file
tepyd report --level newb        # teach the concepts (intro + glossary + advice)
tepyd report --level expert      # a terse one-line-per-finding checklist
```

Every report opens with a **context lead-in** — what's being measured and why — and the `--level` knob tunes how much it explains, not which problems it finds:

| Level | What you get |
|-------|--------------|
| `newb` | A full plain-language explanation of the pyramid, every finding's what/why/fix, general advice, and a glossary. |
| `junior` | A shorter context, every finding's what/why/fix, and general advice. |
| `senior` (default) | A brief context, then each finding's what/why/fix — no hand-holding. |
| `expert` | A one-line context note, then one line per finding: the marker, the title, and the action. |

It reports an overall **health verdict** (`healthy` / `fair` / `needs work`), a one-line summary of each lens, and findings ordered by severity (`✗` problem, `⚠` warning, `ℹ` info). `--min-src` and `--exclude` work as they do on the individual lenses.

## `tepyd init`

Rather than write the config by hand, let Tepyd guess it. `tepyd init` looks around the project for the usual clues — a `src/` package (or a flat top-level one), a `tests/` tree split into tiers, a `modules/` sub-layout, a top-level browser-test root — and **appends** a commented `[tool.tepyd]` block to `pyproject.toml`, leaving the rest of the file untouched.

```bash
tepyd init             # detect and write the section
tepyd init --dry-run   # print what it would write, change nothing
```

When it can't make a confident guess — for example, several packages under `src/` with none named `app` or matching the project name — it **asks you to choose** (when run interactively). In a non-interactive context (a pipe, CI), it falls back to the first candidate and prints a note rather than blocking. It also refuses to overwrite an existing `[tool.tepyd]` section, and prints a note for anything else it had to guess (an undetected source root, no test tiers). The result is a head start to review — not necessarily a finished config.

## Configuration

`tepyd init` writes this for you, but here is the full reference. Everything is configured under `[tool.tepyd]` in `pyproject.toml`. With no section at all, the defaults below apply (except `exclude`, which is always empty unless set — exclusions are per-project policy, not a layout default).

```toml
[tool.tepyd]
src_root    = "src/app"   # filesystem root of the analysed source
src_package = "app"       # dotted import path; reserved — not read by any lens yet

# How to slice the source tree into units (globs, first-match-wins): explode
# each package under modules/ one level deep, then take every other
# top-level package as a unit.
units = ["modules/*", "*"]

# Line counter: "internal" (built-in, no dependency) or "cloc".
counter = "internal"

# Source units excluded from analysis — a reason is REQUIRED, so the policy
# decision is documented where it's made.
[tool.tepyd.exclude]
faker = "seed/fake-data generator, exercised via fixtures"

# Test tiers, cheapest first (bottom of the pyramid first). Any number of tiers.
[[tool.tepyd.tiers]]
name = "a_unit"
root = "tests/a_unit"
label = "unit"
target_share = 0.60          # policy gate: ≥ 60 % of test LOC should be here

[[tool.tepyd.tiers]]
name = "b_integration"
root = "tests/b_integration"
label = "integration"
# Optional: scope this tier to the units it should test (globs over unit
# names). Units outside the scope are reported n/a, not as gaps. Omit it
# and the tier is checked against every unit.
expects = ["repositories/*", "infrastructure/*"]

[[tool.tepyd.tiers]]
name = "c_e2e"
root = "tests/c_e2e"
label = "http-e2e"

[[tool.tepyd.tiers]]
name = "e2e_playwright"
root = "e2e_playwright"      # an arbitrary root — needn't live under tests/
label = "browser"
strip_prefix = "modules/"    # this tier flattens the layout: modules/biz → biz
```

### Reference

| Key | Default | Meaning |
|-----|---------|---------|
| `src_root` | `"src/app"` | Filesystem root of the analysed source. |
| `src_package` | `"app"` | Dotted import path of the source. Currently informational — no lens reads it yet (`cover` keys off `src_root`). |
| `units` | `["modules/*", "*"]` | Glob patterns slicing the source tree into units; first match wins, and a container whose children were already claimed is skipped. A pattern ending in `.py` (e.g. `["*.py"]`) makes each top-level **module** a unit — for flat packages with no sub-packages; `["*"]` stays directories-only. |
| `counter` | `"internal"` | `internal` (tokenize-based; counts non-blank, non-comment lines) or `cloc`. |
| `exclude` | `{}` | Table of `unit = "reason"`; the reason is required. |
| `tiers` | four tiers (see above) | Array of tables, cheapest-first. |

Per-tier keys: `name` (required), `root` (required), `label` (defaults to `name`), `target_share` (0–1, optional policy gate), `strip_prefix` (mapping rewrite for tiers that flatten the layout), `expects` (glob patterns scoping the tier to the units it should test; unset means every unit).

A different project just describes itself — e.g. a flat `src/` with two tiers:

```toml
[tool.tepyd]
src_root = "src"
src_package = "mypkg"
units = ["*"]

[[tool.tepyd.tiers]]
name = "unit"
root = "tests/unit"

[[tool.tepyd.tiers]]
name = "e2e"
root = "tests/e2e"
```

## What Tepyd is not

- **Not a test runner**, and not a replacement for pytest or coverage — `cover` orchestrates them.
- **Not a correctness checker.** LOC is a *proxy* for effort, and a test's tier is decided by its directory, not by what it exercises.
- **Not a pass/fail gate** (today): it reports data and advice. The mirror and cover figures are diagnostics, not targets — `1/20 mirrored` for a browser tier is often correct by design.

## Development

```bash
make test    # pytest
make lint    # ruff + ty + pyrefly + mypy
make format  # ruff format + autofix
```

Tests are themselves organised as a pyramid (`tests/a_unit`, `tests/b_integration`, `tests/c_e2e`) — Tepyd eats its own dog food.

## Changelog

See [CHANGES.md](CHANGES.md).

## License

Tepyd is licensed under the Apache License 2.0 — see [LICENSE](LICENSE).
