Metadata-Version: 2.4
Name: llm-feedback-control
Version: 0.1.0
Summary: Reliable, checkable structured output from a small local LLM, by wrapping it in a deterministic feedback loop: a regime gate + exact graph analysis + explicit refusal, plus a bounded re-extraction loop. Zero runtime dependencies; runs with no model at all.
Author-email: Edward Chalk <edward.chalk@sapientronic.ai>
License: llm-feedback-control
        
        Copyright (c) 2026 Edward Chalk (sapientronic.ai)
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to use,
        copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
        the Software, and to permit persons to whom the Software is furnished to do
        so, subject to the following conditions:
        
        1. The above copyright notice and this permission notice shall be included
           in all copies or substantial portions of the Software.
        
        2. Attribution. Any publication, presentation, derivative work, or product
           that uses or builds on this Software must include visible attribution to
           Edward Chalk and sapientronic.ai. The phrase "Built with llm-feedback-control
           by Edward Chalk (sapientronic.ai)" or equivalent is acceptable.
        
        3. The Software is provided "AS IS", without warranty of any kind, express
           or implied, including but not limited to the warranties of merchantability,
           fitness for a particular purpose, and noninfringement. In no event shall
           the authors or copyright holders be liable for any claim, damages, or
           other liability, whether in an action of contract, tort, or otherwise,
           arising from, out of, or in connection with the Software or the use or
           other dealings in the Software.
        
        This license is modeled on the MIT License with an explicit attribution
        clause (clause 2).
        
Project-URL: Homepage, https://github.com/pcoz/llm-feedback-control
Project-URL: Repository, https://github.com/pcoz/llm-feedback-control
Project-URL: Issues, https://github.com/pcoz/llm-feedback-control/issues
Project-URL: Changelog, https://github.com/pcoz/llm-feedback-control/blob/main/CHANGELOG.md
Keywords: llm,feedback-control,structured-extraction,state-machine,workflow,hallucination,reliability,ollama,small-language-model,auditable,refusal
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: aws
Requires-Dist: boto3>=1.26; extra == "aws"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=4.0; extra == "dev"
Dynamic: license-file

# llm-feedback-control

**Get reliable, checkable structured output from a small, local language model —
by wrapping it in ordinary deterministic code.**

[![CI](https://github.com/pcoz/llm-feedback-control/actions/workflows/ci.yml/badge.svg)](https://github.com/pcoz/llm-feedback-control/actions/workflows/ci.yml)

---

## What it actually does

You hand it a process written in plain English:

> "A claim enters Intake. From Intake it goes to Triage. Triage goes to FastTrack
> or to Investigation. FastTrack goes to Payout. Investigation goes to Payout or
> to Denied. Payout goes to Closed. Denied goes to Closed."

and it:

1. **turns that into a state machine** — the steps (states) and the arrows between
   them (transitions);
2. **computes provable facts** about it — which steps are dead ends, whether
   there are loops, which steps can't be reached from the start;
3. **writes a report where every statement is backed by one of those checked
   facts** — so it can't quietly make things up;
4. **knows its own limits.** If the text isn't actually a finite step-by-step
   process (e.g. *"prices drift up as confidence grows"*), it **refuses** instead
   of inventing a fake state machine. And if the model's first pass missed part of
   the process, it **loops to fill the gaps** — or refuses if it can't.

The point: you get **higher-quality, auditable structured output from a *small*
model**, trading a few extra passes (latency) for accuracy — no extra parameters,
no special mathematics, no cloud. It runs on a laptop, and the deterministic parts
run **with no model at all**.

## Quickstart (works with no model)

```bash
pip install llm-feedback-control      # zero dependencies — pulls nothing else
```

```python
from llm_feedback_control import run_audit

r = run_audit("A claim enters Intake. From Intake it goes to Triage. "
              "Triage goes to FastTrack or to Investigation.")
print(r["result"])         # OK
print(r["report_facts"])   # terminals, loops, unreachable steps — all checked
```

That already works on a bare install: with no model reachable it uses a
deterministic regex extractor plus exact graph analysis. **Plug in a model and the
extraction quality goes up — nothing else changes.**

From the command line:

```bash
lfc "A ticket opens in New. New goes to Assigned. Assigned goes to Resolved."
lfc --check        # tells you exactly what backend is available and what to do
lfc --demo         # runs the three worked demos
```

### Add a model (optional, recommended)

The library is **not tied to any provider.** Three ways to give it a model:

```bash
# 1. Local, free, private — install Ollama (https://ollama.com), then:
ollama pull phi3:mini

# 2. OpenAI (stdlib HTTP, no SDK):
export CEILING_BACKEND=openai OPENAI_API_KEY=sk-...
```

```python
# 3. Bring your own: pass any callable f(prompt, fmt=None) -> str
def my_llm(prompt, fmt=None):
    ...                       # call Anthropic, a local server, anything
run_audit(text, generate=my_llm)
```

Run `lfc --check` any time to see what's wired up.

## How it works — "feedback control", explained

The design is borrowed from **electronics.** A raw LLM is like a very high-gain
amplifier: hugely powerful, but left to run "open-loop" it overshoots — fluent,
yet it drifts and hallucinates. Engineers tame such an amplifier by adding a
**feedback loop**: feed the output back, compare it against a stable reference,
and trade some raw power for precision and stability. This library is that
feedback loop for an LLM. The "reference" is plain deterministic code — graph
checks and schema rules — that the model's output is measured against.

There are two kinds of feedback, and the library uses both:

### Negative feedback — the stabilising checks (`run_audit`)

This is the half that *grounds and refuses*. In plain terms:

| step | what it means |
|---|---|
| **regime gate** | First decide whether the text is even the kind of thing we can analyse exactly (a finite, step-by-step process) versus something fuzzy and continuous. Refuse the fuzzy ones. |
| **extraction + schema** | Ask the model for the state machine, but force the answer into a strict shape — and fall back to a deterministic regex extractor if it won't comply (or if there's no model). |
| **exact analysis** | Compute provable facts about the graph: dead ends, loops, unreachable steps. (Plus an *optional* finite-field "spectral fingerprint" — see below.) |
| **grounded report** | Write the summary using only those verified facts, naming only real states. |
| **explicit refusal** | When the input is out of regime, or a result can't be made exact, say so — don't guess. |

### Positive feedback — the gap-filling loop (`extract_iterative`)

A one-shot extraction often silently **drops a branch** — the model says "OK"
while quietly missing *Investigation → Denied*. Positive feedback fixes that: it
**re-asks the model about anything the source text mentions that's missing from
the answer**, and repeats until nothing is missing (a *fixed point*).

Positive feedback is where capability *and* instability both live, so it's bounded
by two negative-feedback safeguards: a deterministic consistency check (does the
graph cover everything the text mentions?) and a **refusal clamp** — if it can't
converge within a few passes, it refuses to report a confident-but-incomplete
result rather than running away. This **refusal-as-stabilizer** is what makes the
regenerative loop safe.

## What's measured so far

Indicative results, not benchmarks — small corpora, a 3.8B local model
(`phi3:mini`), greedy decoding. See [`docs/results.md`](docs/results.md) for the
full tables and method.

**Headline (run on EC2 against a ~28 GB ceiling model, mixtral 8x7B):** on a
messy, branchy, distractor-laden workflow corpus, the small model **+ the feedback
loop essentially matches a model ~7× its size.**

| configuration | states F1 | transitions F1 |
|---|---|---|
| small model (phi3:mini), one-shot | 0.98 | 0.89 |
| **small model + feedback loop** | **1.00** | **0.90** |
| big ceiling model (mixtral, ~28 GB), one-shot | 1.00 | 0.91 |

→ the loop recovers **100%** of the small→big gap on states and **77%** on
transitions — and on several individual workflows the closed-loop small model
*beat* the big model, because the deterministic reference catches edges that raw
fluency invents or drops.

Other measured pieces: extraction states precision/recall ≈ 1.00 / 0.92; the
regime gate scores 1.00 precision/recall separating finite from continuous on a
clean corpus (it's brittle on deliberately *mixed* inputs — an open problem).

## Documentation

| doc | contents |
|---|---|
| [`docs/index.md`](docs/index.md) | overview and where to start |
| [`docs/architecture.md`](docs/architecture.md) | the op-amp model in depth; the pipeline; refusal-as-stabilizer |
| [`docs/usage.md`](docs/usage.md) | install, the API, the CLI, configuration, bring-your-own-backend |
| [`docs/results.md`](docs/results.md) | the measured results, method, and honest scope |
| [`docs/api.md`](docs/api.md) | reference for every public function |
| [`docs/faq.md`](docs/faq.md) | "do I need a GPU?", "what models?", "does it work offline?" … |

## Repository layout

```
src/llm_feedback_control/   the package (zero-dependency, pure standard library)
  llm.py                    the LLM client + injectable backend + a doctor()
  auditor.py                the negative-feedback pipeline (run_audit)
  feedback.py               the bounded positive-feedback loop (extract_iterative)
  __main__.py               the `lfc` command-line tool
experiments/                repro scripts for the measured results (not shipped)
aws/                        optional: run a large ceiling model on EC2 (not shipped)
docs/                       the documentation suite
tests/                      deterministic tests (no model / no network)
```

## Honest scope

- **A reliability architecture, not a model improvement.** The win is "the system
  knows what it can compute exactly and refuses the rest" — orthogonal to model
  scale. It helps on the *structured / verifiable slice* (workflows, state
  machines, configs), not open-ended generation.
- **It uses no special mathematics.** The deterministic reference is plain
  graph/text consistency. (The finite-field "spectral fingerprint" is an *optional*
  extra exact check, honestly redundant with graph analysis for most workflow
  audits — keep it or ignore it.)
- **Needs a deterministic reference.** Where there's nothing to check against, the
  gate (correctly) refuses to claim exactness.
- **Results are indicative.** Small corpora; treat the numbers as direction, not
  guarantees.

## Origin

This project is the practical, validated spin-off of an internal research
investigation. The investigation's grander mathematical claims did not hold up
under measurement; this engineering architecture — LLM feedback control with
refusal-as-stabilizer — is the part that did. It stands on its own.

## License

MIT with an attribution clause — see [`LICENSE`](LICENSE).
Built with llm-feedback-control by Edward Chalk (sapientronic.ai).
