Open source · MIT License · v0.11.0

Keep your models balanced.

Fine-tuning silently degrades what your model already knows. pyrecall detects exactly what changed and lets you roll back specific skills before bad weights reach production.

$ pip install pyrecall

Fine-tuning has a silent failure mode.

You take a model that's good at a lot of things. You fine-tune it on your data — customer support tickets, legal documents, whatever. It gets noticeably better at your task. You ship it.

Six weeks later something feels off. The model handles your task fine but everything else is worse. Reasoning is shakier. Code quality dropped. It's less careful about safety cases it used to handle cleanly. Your loss curve looked fine the whole time.

This is catastrophic forgetting. It's well documented in research and almost completely ignored by tooling. There's no alarm. Nothing in your training loop catches it. pyrecall does.

Snapshot. Detect. Rollback.

Snapshot

Before training, pyrecall runs your model through 64 benchmark prompts across 8 skill categories and saves the scores. That's your baseline.

Detect

After training, the same benchmarks run again. Any skill that dropped beyond your threshold gets flagged. The report prints to your terminal in color — green held, red degraded.

Rollback

Every training run saves a LoRA adapter checkpoint. Roll back to any named snapshot instantly. Not the whole model — just the adapter weights that drifted.

Six lines. That's the whole workflow.

from pyrecall import Model

model = Model("meta-llama/Llama-3.2-1B", strategy="lora")

model.snapshot(name="before_v1")   # lock in current skill scores
model.learn("data.jsonl", epochs=3)  # fine-tune on new data
report = model.check()              # what got worse?
model.rollback(to="before_v1")      # fix it

Six commands. Everything you need.

$ pyrecall init # set up pyrecall in your project
$ pyrecall snapshot v1 # save current skill scores
$ pyrecall check # detect forgetting since last snapshot
$ pyrecall rollback v1 # restore a previous snapshot
$ pyrecall status # view all snapshots and scores
$ pyrecall delete v1 --yes # permanently remove a snapshot
$

Five skill categories, benchmarked automatically.

Reasoning

Multi-step logic and problem solving.

Coding

Code generation, debugging, explanation.

Instruction Following

Does it do what you asked?

General Knowledge

Breadth of factual accuracy.

Safety

Handling sensitive and edge case prompts.