Metadata-Version: 2.4
Name: slopscore-lint
Version: 0.4.2
Summary: Transparent AI-slop writing-pattern analysis for essays, blog posts, Markdown, JSON, and websites.
Project-URL: Homepage, https://github.com/jman4162/slopscore
Project-URL: Issues, https://github.com/jman4162/slopscore/issues
Author-email: John Hodge <jhodge007@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: ai-slop,linter,nlp,text-analysis,writing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Other Audience
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.11
Requires-Dist: ftfy>=6.2
Requires-Dist: jsonpath-ng>=1.6
Requires-Dist: marko>=2.1
Requires-Dist: numpy>=1.26
Requires-Dist: pydantic>=2.7
Requires-Dist: pysbd>=0.3.4
Requires-Dist: pyyaml>=6.0
Requires-Dist: regex>=2024.5
Requires-Dist: rich>=13.7
Requires-Dist: scikit-learn>=1.4
Requires-Dist: typer>=0.12
Provides-Extra: all
Requires-Dist: httpx>=0.27; extra == 'all'
Requires-Dist: jinja2>=3.1; extra == 'all'
Requires-Dist: lingua-language-detector>=2.0; extra == 'all'
Requires-Dist: sentence-transformers>=3.0; extra == 'all'
Requires-Dist: spacy>=3.7; extra == 'all'
Requires-Dist: trafilatura>=2.0; extra == 'all'
Provides-Extra: detectors
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Provides-Extra: eval
Requires-Dist: datasets>=2.19; extra == 'eval'
Requires-Dist: huggingface-hub>=0.23; extra == 'eval'
Provides-Extra: lang
Requires-Dist: lingua-language-detector>=2.0; extra == 'lang'
Provides-Extra: ml
Requires-Dist: lightgbm>=4.0; extra == 'ml'
Provides-Extra: nlp
Requires-Dist: sentence-transformers>=3.0; extra == 'nlp'
Requires-Dist: spacy>=3.7; extra == 'nlp'
Provides-Extra: report
Requires-Dist: jinja2>=3.1; extra == 'report'
Provides-Extra: web
Requires-Dist: httpx>=0.27; extra == 'web'
Requires-Dist: trafilatura>=2.0; extra == 'web'
Description-Content-Type: text/markdown

# slopscore

A transparent **linter for AI-slop writing patterns** in essays, blog posts, Markdown, JSON, and
websites.

`slopscore` reads text and returns a 0 to 100 **SlopScore** measuring the density of formulaic,
generic, low-specificity, over-polished writing patterns associated with low-effort LLM output.
It reports per-dimension scores and **evidence spans** (the exact phrases that triggered each
finding), so you can see and fix what it flags.

> ### ⚠️ What slopscore is NOT
> It does **not** detect whether text was written by AI, and must never be used to accuse a writer.
> It flags writing *patterns* in *text* (not authorship, not authors): patterns common in
> low-effort or AI-like prose **and** in plenty of human writing. Use it as a prose linter to nudge
> toward clearer, more specific writing, not as an AI detector. Authorship detectors are unreliable
> and biased; slopscore deliberately is not one.

## What it is, and what it is not

slopscore detects **writing patterns**, not authorship. It does not claim a text was written by
AI, and it should never be used to accuse a writer. AI-authorship detectors are unreliable on
short, edited, translated, and non-native-English text, so slopscore takes a more honest and more
useful position:

> "This text has a high concentration of generic, formulaic, low-evidence writing patterns."

not

> "This was written by AI."

Think of it as a linter for slop, closer to Vale or ruff than to a black-box AI detector.
Every point in the score comes from a visible rule with an evidence span.

## Install

```bash
pip install slopscore-lint            # lean, rule-based core
pip install "slopscore-lint[web]"     # + website extraction (trafilatura)
pip install "slopscore-lint[nlp]"     # + spaCy NER and sentence-transformer embeddings
pip install "slopscore-lint[lang]"    # + non-English language detection
pip install "slopscore-lint[report]"  # + HTML report rendering (Jinja2)
pip install "slopscore-lint[all]"     # everything
```

> **Name note:** the PyPI package is `slopscore-lint` (plain `slopscore` belongs to a different
> tool). The import stays `import slopscore`, and the command is `slopscore-lint`.

## Usage

```bash
slopscore-lint scan post.md
slopscore-lint scan essay.txt --format json
slopscore-lint scan content.json --json-path "$.article.body"
slopscore-lint scan https://example.com/post        # requires slopscore-lint[web]
```

### Calibrate against your own writing

Instead of asking "does this look like AI?", ask "does this deviate from *my* usual style in
sloppy ways?". Build a baseline from a folder of your past writing, then compare new drafts to it:

```bash
slopscore-lint calibrate ./my-old-posts --name me
slopscore-lint scan new-post.md --baseline me     # reports per-dimension z-score deviations
```

### Higher-precision syntactic detection (optional)

The default install detects syntactic tells (trailing "-ing" analyses, and so on) with regex.
Install the `[nlp]` extra and the spaCy English model for a higher-precision, lower-false-positive
path:

```bash
pip install "slopscore-lint[nlp]"
python -m spacy download en_core_web_sm
```

slopscore auto-upgrades to the spaCy path when the model is present; nothing else changes.

### Use it as a linter in CI

```bash
slopscore-lint scan ./content --recursive --fail-on high          # exit 1 if any high finding
slopscore-lint scan ./content --recursive --format sarif -o out.sarif   # for GitHub code scanning
slopscore-lint scan post.md --format html -o report.html          # highlighted-span HTML (needs [report])
slopscore-lint scan . --diff origin/main --fail-on medium         # only files changed vs a ref
```

Exit codes: `0` clean (or below `--fail-on`), `1` findings at or above the threshold, `2` usage
error, `3` a needed extra is missing. A composite **GitHub Action** (`action.yml`) scans, uploads
SARIF to code scanning, and fails by threshold; a **pre-commit hook** (`.pre-commit-hooks.yaml`)
is published for `pre-commit`. SARIF and HTML line numbers for Markdown are relative to the
extracted prose (raw-source mapping is a later enhancement).

```python
from slopscore import SlopScorer

scorer = SlopScorer(profile="blog", strictness="conservative")
# the argument below is an example of the slop the tool flags:
report = scorer.scan_text("In today's fast-paced digital landscape, it is crucial to leverage synergy.")
print(report.score.slop_score, report.score.label)
print(report.evidence[:3])
```

## Status

v0.4: linter maturity. `slopscore.toml` / `[tool.slopscore]` config with per-rule toggles and
severity overrides, inline `<!-- slopscore-disable … -->` suppression, a findings baseline
(`--fail-on-new`), the implemented `unsupported_claims` dimension, opt-in `--suggest` rewrite
suggestions (with SARIF `fixes`), an optional **separate** authorship-adapter interface (no
detector bundled), PyPI packaging, and a docs site.

v0.3: an evaluation framework (`slopscore-lint eval`: TPR@FPR, PR-AUC, calibration, per-subgroup
FPR) and a transparent **learned scorer**, a sign-constrained, calibrated logistic regression over
the 13 dimensions, serialized as auditable JSON and run with pure numpy (`--scorer ml`). The rule
scorer stays the default: under a replace-if-wins gate the learned model must beat it on held-out
TPR@1%FPR *without regressing subgroup false positives*, and on the seed set it does not (it
over-flags plain English). See `MODEL_CARD.md` and `DATA_SOURCES.md`.

v0.2.1: productionization. console/JSON/Markdown/**SARIF**/**HTML** reports, recursive and
changed-files (`--diff`) batch scanning with CI exit codes, a GitHub Action, and a pre-commit hook.

v0.2: detection expansion grounded in Wikipedia's "Signs of AI writing" field guide. Dimensions:
lexical markers, formulaic structure, significance inflation, superficial "-ing" analyses, vague or
over-attribution, negative parallelism and rule-of-three, copula avoidance, genericity, redundancy,
cadence, formatting tells, prompt residue, and a negative human-writing signal. Scoring is
conservative by default: a corroboration gate damps weak-alone tells, and scores abstain on short
or non-English input. See `MODEL_CARD.md` for citations and limitations.

## License

MIT
