Metadata-Version: 2.4
Name: slopscore
Version: 0.1.0
Summary: Heuristic linter that scores AI residue in commits and PRs 0-100 with evidence - flags craft, not authorship
Author: Matt Whelan
License: MIT
Project-URL: Homepage, https://github.com/koopatroopa/slopscore
Project-URL: Repository, https://github.com/koopatroopa/slopscore
Project-URL: Issues, https://github.com/koopatroopa/slopscore/issues
Keywords: lint,ai,slop,commit,git-hooks,pull-request,code-quality,ci
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# slopscore

[![tests](https://github.com/koopatroopa/slopscore/actions/workflows/ci.yml/badge.svg)](https://github.com/koopatroopa/slopscore/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/slopscore)](https://pypi.org/project/slopscore/)
[![Python](https://img.shields.io/badge/python-3.11%2B-blue)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue)](LICENSE)

### **_Catch the slop before it ships._**

AI tools leave fingerprints — em-dashes, a rocket 🚀 emoji no human would
ever type by hand, stray `Co-Authored-By: Claude` trailers,
`# ... rest of the code unchanged` stubs, ghost import declarations,
`Summary by CodeRabbit` stamps. slopscore detects these AI-generated
tells by putting a number on them — every commit, push and PR scored out
of 100, every finding backed by evidence, all before it ships.

## Install

```sh
pip install slopscore
```

Then, inside each repo you want watched:

```sh
slopscore install-hooks
```

That second step is the point of the tool - a score on every commit and push.
Python 3.11 or newer. No other dependencies.

Plenty of tools lint AI residue in code. **Nothing else scores the prose that
ships with it** - the commit messages and PR text where the residue lives in
git history forever. It starts as a nudge; **one git config turns it into a
hard gate** that refuses any commit or push over your threshold, and off again
just as fast.

It flags **craft, not authorship**. slopscore never claims "this is AI" - it
points at fixable leftovers with evidence, like a linter flagging a missing
semicolon. (Style-based AI detection is unreliable and biased against
second-language writers, so it refuses to do it: 24 real pre-2020 human PRs
must score LOW forever - `tests/test_holdout.py`.) Everything runs locally: no
LLM, no network, no telemetry.

## Who runs it

Two kinds of people, both on their **own** work:

- **You used AI and want to ship clean.** Catch the residue - a leftover
  trailer, a placeholder stub, unedited boilerplate - before a reviewer does.
  Run it on yourself via the hooks, or as a shared standard in CI (report-only
  for repos with outside contributors).
- **You didn't use AI but fear a detector will say you did.** Formal and
  second-language writers get wrongly flagged constantly. Turn on the thorough
  tier (`--strict`) and slopscore shows your *own* writing through a crude
  detector's eyes - the em-dashes, the "delve", the scaffolding those tools
  latch onto - with evidence, so you can see your exposure and decide whether
  to defang it. slopscore does not believe these prove AI; it refuses that. It
  just lets you see what the flawed tools out there react to.

Either way the score is *information*, never an accusation: the verdict is the
only gate, and honest human work is built to pass.

## Tested against reality

Every claim above is measured, not asserted. The signals are validated
against nearly 5,000 real commits and PR bodies from public GitHub history,
under pre-registered rules: the metric and the pass bar are written down
before the data is scored, every rejected idea is logged, and a held-out set
of 24 pre-2020 human PRs is never tuned against - CI enforces that it scores
LOW forever.

- **2,300+ real human commits and PR bodies (pre-2022, definitionally
  pre-LLM): zero false flags.**
- **1,700+ commits and PR bodies carrying real AI attribution: 99.9%
  flagged.**
- **785 disciplined, human-reviewed AI-assisted commits: all PASS.** It
  measures residue, not authorship - AI-assisted work that someone actually
  edited scores like human work.

When the corpus catches the engine being wrong, the engine changes - and
signals that false-fired on real humans were rejected and stay rejected.

## Try it

Scoring a commit on its way out - the message and the staged code, with each
finding pinned to where it is:

![slopscore scoring a commit: 70/100 HIGH](assets/sample-report.svg)

Or score raw text from the command line:

```sh
echo "Certainly! Let's delve into a robust refactor. Generated with Claude Code." | slopscore --text -
```

![slopscore scoring the line above: 86/100 HIGH](assets/text-demo.svg)

Drop the "Generated with Claude Code." sentence and it falls to 13.6, PASS -
single weak signals are normal writing; only convergence flags.

The score is a **gradient** - *how much* AI residue is in the text, not a
yes/no. A low non-zero score is light texture, not an accusation; the
**verdict** is the only gate, and it is tuned so honest human work passes.

- Bands: **LOW** below 30, **MEDIUM** 30 to under 70, **HIGH** 70 and up. The
  **verdict** FLAGs at/above the threshold (default 30), so MEDIUM and HIGH
  both flag.
- The **exit code** follows the verdict: `0` pass, `1` flag, `2` usage error -
  so it drops straight into CI or a git hook.
- Evidence carries `path:line` for code; JSON input gets prose locations too
  (`body:14`, `commit[2]:1`).

On a terminal the report is coloured by band (green/amber/red); piped output
stays plain. `--color` and `NO_COLOR` override.

## Three ways to use it

**1. The CLI - check your work by hand.**

```sh
# score a PR/issue described as JSON ({title, body, commits})
slopscore pr.json
# or raw text, or stdin
echo "Quick fix. Generated with Claude Code." | slopscore --text -
# scan code files too (go wide on your working tree)
slopscore --files src/*.py --json
# thorough tier: also score the opt-in signals (see "Who runs it")
slopscore --strict pr.json
```

**2. Git hooks - score every commit and push; block them when you say so.**

```sh
slopscore install-hooks
```

Or via the [pre-commit](https://pre-commit.com) framework:

```yaml
repos:
  - repo: https://github.com/koopatroopa/slopscore
    rev: v0.1.0
    hooks:
      - id: slopscore-commit-msg
      - id: slopscore-pre-push
```

`commit-msg` scores your message plus the staged code; `pre-push` scores each
outgoing commit (flagged ones reported as `[abc1234] Subject`, clean pushes
silent). **Advisory by default - they never block until you ask.** The gate is
one setting, per repo, instant in both directions:

```sh
git config slopscore.block true       # gate ON: refuse commits/pushes at/above the threshold
git config slopscore.threshold 50     # move the bar (default 30)
git config slopscore.block false      # gate OFF: back to advisory
git config slopscore.strict true      # thorough tier: score the opt-in signals too
```

Escape hatches even with the gate on: `git commit --no-verify` (or push) skips
it once; `SLOPSCORE_BLOCK=0` (or `=1`) overrides the setting for one command.
Every report's footer tells you the current state and the command to flip it.

**3. CI - the same gate on every PR, two lines on either platform.**

Both recipes score the PR/MR's prose (title + description) plus the diff's
added lines, report-only until you flip the gate (exit `0` pass, `1` flag).
It runs on every contributor, so it is a shared craft standard - keep it
report-only on repos with outside contributors.

GitHub (run it under `pull_request`, not `pull_request_target` - slopscore
only needs to read the diff, never repo write access or secrets):

```yaml
on: pull_request
permissions:
  contents: read
# ...
- uses: actions/checkout@v4
  with: { fetch-depth: 0 }
- uses: koopatroopa/slopscore@v0   # pin to a tag
  with: { fail-on-flag: "false" }  # "true" = gate the merge
```

GitLab:

```yaml
include:
  - remote: https://raw.githubusercontent.com/koopatroopa/slopscore/main/ci/slopscore.gitlab-ci.yml
```

The GitLab job ships advisory (`allow_failure: true`); redeclare the job to
remove that or set `SLOPSCORE_THRESHOLD`. Any other CI works the same way:
`pip install slopscore`, feed it `--text` and `--diff`, gate on the exit
code.

## What it looks for

Signals that are **on by default** - distinctive leftovers with a very low
false-positive rate:

- `ai_self_reference` - explicit AI attribution: trailers ("Co-Authored-By:
  Claude"), assistant self-talk ("As an AI..."), and the stamps AI review
  bots leave in PR bodies ("## Summary by CodeRabbit", aider's
  auto-generated-PR header)
- `ai_cliche_phrases` - chatbot filler ("delve into", "it's worth noting")
- `code_placeholder_stub` - placeholder markers left in code ("// ... rest of
  code", "your implementation here"), reported with file and line.
- `em_dash_density`, `emoji_density` and `curly_quotes` - U+2014, decorative
  emoji and word-processor quotes are not on your keyboard; in coding
  artefacts they arrive via tooling. These only ever add to a score (they are
  excluded from the normalisation ceiling), so enabling or disabling them
  cannot dilute the signals above - and their combined contribution is
  capped, so however many fire at once they can colour a score but never
  flag on their own. Real Greenkeeper-era PR bodies taught us that one.

Signals that are **opt-in**, because humans genuinely type them:

- `code_undeclared_import` - an import that is not in the standard library, not
  declared in your manifest and not a local module - possibly a package the
  model made up. It reads your `pyproject`/`requirements` and never imports or
  installs anything.
- `sycophantic_openers` ("Certainly!", "Hope this helps!") - chat register,
  not commit register: across 2,500+ real AI-attributed commits and PR bodies
  it fired zero times, and its only corpus hits were friendly humans. Kept
  for scanning pasted chat output.
- `promotional_adjectives` ("robust", "comprehensive"), `section_scaffolding`
  (`## Overview` headers - PR templates generate these), `bold_lead_in_lists`,
  `negative_parallelism` ("not just X, but Y" and its TED-talk cousins),
  `rhetorical_qa` ("Why? Because...") and `vague_authority` ("studies show").

## Configure it

A TOML config toggles any signal, overrides weights and sets the threshold:

```toml
threshold = 40

[signals]
emoji_density = false           # opt a default signal out
code_undeclared_import = true   # opt the import check in

[weights]
ai_self_reference = 6.0
```

Point each surface at it:

```sh
slopscore --config slopscore.toml --files src/*.py   # CLI
git config slopscore.config slopscore.toml           # hooks, per repo
# Action: pass `config: "slopscore.toml"` in the workflow's `with:` block
```

## Honest about its limits

- Calibration is validated on the corpus above: humans top out at a score
  of 25, real attributed-AI sits at 70+, and the flag bar (30) lives in the
  empty gap between them. An explicit attribution trailer is a certain tell, so it
  scores HIGH on its own; weak signals must converge to get there.
- The corpus has an era gap by construction: the human side is pre-2022
  (provably pre-LLM), the AI side is 2023+. Stated, not pretended away - a
  pass against modern human PR-template prose is the known next step.
- The import check resolves against your manifest; import-name vs package-name
  mismatches (`yaml` vs `PyYAML`) are only partly covered by an alias map -
  hence opt-in. `requirements.txt` `-r` includes are not followed.
- It is a *signal*, not a judge. A high score means "give this a second read",
  never "this is AI".

## Design

The detection engine is framework-free; the CLI, git hooks and Action are thin
front-ends over it. Heuristic-only by design - the value is the discipline
(low false-positive, evidence-backed, craft not authorship), not a cleverer
classifier.

MIT licensed.
