Metadata-Version: 2.4
Name: penprint
Version: 0.1.0
Summary: A deterministic voice gate for writing — turn 'does this sound like me?' into a number, with no LLM.
Author-email: "Sanjay (sanjoxtech)" <sanjox.tech@gmail.com>
License: MIT
Project-URL: Homepage, https://sanjox.tech
Project-URL: Repository, https://github.com/sanjoxtech/penprint
Project-URL: Author, https://www.linkedin.com/in/sanjoxtech/
Keywords: stylometry,writing,voice,loop-engineering,agents,linter,brand-voice
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# 🖋️ Penprint

<p align="center">
  <img src="docs/assets/readme-banner.png" alt="Penprint — your voice, by the numbers" width="100%">
</p>

**A deterministic voice gate for writing. Turn "does this sound like me?" into a number — with no LLM.**

![python](https://img.shields.io/badge/python-3.9+-blue)
![deps](https://img.shields.io/badge/dependencies-0-brightgreen)
![llm](https://img.shields.io/badge/LLM%20in%20the%20gate-none-success)
![license](https://img.shields.io/badge/license-MIT-black)

---

## The problem

Code has tests, so coding agents can run in a loop until the tests pass. **Writing has no test.** So people gate writing loops with an *LLM judge* — "score this post 1–10." But an LLM judge is **noisy**: same draft, different score each run. Loops gated that way **oscillate** (6 → 7 → 6 → 7) and never converge.

## The idea

**Penprint is a guitar tuner for your writing voice.** It learns a numeric *fingerprint* from your past posts and scores any new draft against it — **the same draft always gets the same score**, because there's no model in the loop, just math.

That determinism means a writing loop gated by Penprint **converges like a code test-loop.**

```
your past posts ──► penprint fingerprint ──► fingerprint.json  (a few numbers)
                                                    │
new draft ──► penprint score ──► 0–100 + exactly which metrics are off
```

## What it measures (all computed, no LLM)

| Metric | What it captures |
|--------|------------------|
| sentence rhythm | your average sentence length |
| burstiness | your mix of short + long sentences |
| first-person rate | peer "I / we" energy vs detached prose |
| hook length | is the opening line short enough to stop the scroll |
| build-signal | concrete "I built / ran / shipped" markers |
| banned phrases | generic AI/corporate words you'd never use (hard fail) |

## Install

```bash
pip install penprint          # once published
# or, right now, from source:
git clone https://github.com/<you>/penprint && cd penprint
pip install -e .
```

Zero dependencies — it's Python stdlib only. No API key. Runs offline.

## Quickstart

```bash
# 1. learn your voice from your past posts
penprint fingerprint examples/corpus/*.md

# 2. score a draft
penprint score examples/good_draft.md
#  -> "SCORE": 79, "FAILS": [ "burstiness ...", "first-person ..." ]

# 3. use it as a CI / loop gate (exit 1 if below threshold)
penprint score draft.md --min 85

# 4. bring your own banned-phrase list (a word normal in YOUR voice shouldn't be banned)
penprint score draft.md --banned examples/banned.txt
```

## The banned-phrase list

A sane default ships in the tool. A fuller, sourced list lives in [`examples/banned.txt`](examples/banned.txt) — curated from [proselint](https://github.com/amperser/proselint) (BSD), [write-good](https://github.com/btford/write-good) (MIT), [anti-ai-slop-writing](https://github.com/jalaalrd/anti-ai-slop-writing), and the Max Planck 2024 study on post-ChatGPT word inflation. Override or extend it with `--banned <file>` — it's *your* voice, so trim anything that's genuinely yours.

## Use it as a loop gate (the fun part)

Penprint has no LLM — but you can put *any* writing agent in front of it. The agent drafts, Penprint scores, you feed the failing metrics back, the agent fixes, repeat until it passes. Because the gate is deterministic, **it converges.** See [`examples/loop.sh`](examples/loop.sh) for a Claude Code / Codex example.

```
builder agent ─► draft.md ─► penprint score ─► fails? feed them back ─► fix ─► repeat ─► ✅ ≥ threshold
```

## Does Penprint use an LLM?

**No.** Penprint itself is pure Python (`re`, `json`, `statistics`). It's a measuring tape, not a brain. The optional *writer* in a loop can be an LLM of your choice — Penprint just scores the result.

## Prior art / credits

Penprint stands on long-standing work — it's a new *combination*, not a new primitive:

- **Stylometry** (writing-as-numbers, since the 1880s) — e.g. [jpotts18/stylometry](https://github.com/jpotts18/stylometry), [StyloMetrix](https://github.com/ZILiAT-NASK/StyloMetrix). Used for *forensics*; Penprint uses it as a *loop gate*.
- **[Vale](https://vale.sh/)** — prose linter ("style guide as code"). Great rules; no personal voice *score*.
- **[conorbronsdon/avoid-ai-writing](https://github.com/conorbronsdon/avoid-ai-writing)** — preset voice profiles + iterate-to-convergence via an LLM. Penprint differs: a *personal, computed, LLM-free* score from *your own* posts.

The gap Penprint fills: **a personal computed voice score used as a converging loop's gate.**

## Author

Built by **Sanjay** ([sanjoxtech](https://github.com/sanjoxtech)) — [sanjox.tech](https://sanjox.tech) · [LinkedIn](https://www.linkedin.com/in/sanjoxtech/) · sanjox.tech@gmail.com

## License

MIT — see [LICENSE](LICENSE).
