Metadata-Version: 2.4
Name: fakellm-cli
Version: 0.1.0
Summary: Inspect and manage fakellm-assert frozen judgment snapshots from the terminal.
Project-URL: Homepage, https://github.com/1dg618/fakellm-cli
Project-URL: Bug Tracker, https://github.com/1dg618/fakellm-cli/issues
Author-email: Douglas Gregor <1dg618@gmail.com>
License: MIT
License-File: LICENSE
Keywords: assertions,cli,fakellm,llm,llm-as-judge,snapshot,testing
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.9
Provides-Extra: assert
Requires-Dist: fakellm-assert>=0.1.1; extra == 'assert'
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# fakellm-cli

Inspect and manage [`fakellm-assert`](https://pypi.org/project/fakellm-assert/) frozen judgment snapshots from the terminal. Part of the [fakellm](https://pypi.org/project/fakellm/) family.

`fakellm-assert` freezes a judge's verdict about a fuzzy assertion (`satisfies("apologizes for the delay")`) to `.fakellm/judgments/judgments.json` and replays it forever. Those frozen verdicts are artifacts you review in a git diff — and once you have more than a handful, you want to look at them, sanity-check them, and clean them up without hand-editing JSON. That's what this CLI is for.

```bash
pip install fakellm-cli
```

Requires Python 3.9+. `fakellm-assert` itself is **not** a hard dependency — the CLI reuses its types when they're importable but can also inspect a checked-in `.fakellm/` store on a machine that only has the snapshots. Install them together with `pip install fakellm-cli[assert]` if you want both.

## Commands

```bash
fakellm-cli list                         # every frozen verdict, with pass/fail counts
fakellm-cli list --verdict fail          # just the failures
fakellm-cli show "apologizes"            # one verdict in full: reasoning + response excerpt
fakellm-cli show aaaa1111                #   (by fingerprint prefix or criterion substring)
fakellm-cli verify                       # integrity check: schema, verdict values, key match
fakellm-cli prune --verdict fail         # preview removing all failing verdicts (dry run)
fakellm-cli prune --verdict fail --yes   #   actually remove them
fakellm-cli diff main/ feature/          # what changed between two snapshot dirs
fakellm-cli init                         # scaffold .fakellm/ and a conftest.py judge stub
```

Every read command takes `--store PATH` (pointing at either the `judgments` dir or the `judgments.json` file; default `.fakellm/judgments`) and `--json` for machine-readable output. Commands return a non-zero exit code on the condition you'd want to gate CI on: `verify` fails on integrity problems, `diff` fails when a verdict flipped pass↔fail.

## What it does *not* do: re-judge

There is deliberately no `fakellm-cli rejudge`. Two reasons, both structural:

1. **The store doesn't keep enough to re-judge.** A frozen record holds only a 280-character *excerpt* of the response, not the full text. Re-judging needs the exact response to recompute the fingerprint — and that lives in your test, not in the snapshot.
2. **Re-judging is a live model call that belongs in a reviewed run.** `fakellm-assert`'s whole point is that verdicts are produced exactly once, in an explicit `pytest --fakellm-update`, where a human reads the diff. A CLI that judged live would route around the one safety property the library exists to provide.

So the division of labor is: **`pytest --fakellm-update` produces verdicts; `fakellm-cli` manages them.** When `verify` or `diff` tells you a verdict is stale or wrong, the fix is to prune it here and re-judge in pytest there.

## `diff` matches on criterion, not fingerprint

A fingerprint includes the response text, so the "same" assertion against a regenerated response has a *different* fingerprint by design. `diff` therefore pairs verdicts across two stores by `(criterion, judge_model)` so it can actually catch a pass→fail flip, rather than reporting every drifted response as an unrelated add+remove.

## Typical workflow

```bash
fakellm-cli init                    # once, to scaffold
# ... write satisfies() assertions, then:
pytest --fakellm-update             # freeze verdicts (review the diff!)
fakellm-cli list                    # eyeball what got frozen
fakellm-cli verify                  # gate in CI alongside pytest
# later, when you intentionally change a prompt:
fakellm-cli prune --criterion "old wording" --yes
pytest --fakellm-update             # re-freeze
```

## License

MIT
