Driftless preview: eval-backed prompt migrations are ready to test
Introducing migration PRs for model-dependent prompts

Prompt migrations for teams that can’t afford regressions.

Turn model deprecations and eval-data changes into scoped prompt repairs, holdout validation, and evidence-backed pull requests before quality drifts in production.

Evaluations Migration PRs Data drift
support-classifier / gpt-4o-mini holdout passed
Abstract dashboard illustration of Driftless workflow evidence
Eval runs Test Runs
All cases Failed Holdout Cost
Migration performance F1 +7.1 pts
Sarah, QA Lead Priya, PM Marcus, Eng
F1 score 0.99 +7.1 pts
Schema errors 0.0% cleared
Editable files 2 scoped
Trust model Holdout or it does not merge
Signal Current, naive, final scores
Control Only scoped files change
Automation Same CLI locally and in CI

The ROI

One eval standard. Enforced across every prompt migration.

Driftless gives engineering, product, and reviewers the same source of truth: the command that evaluated behavior, the files allowed to change, and the evidence that proves the migration is safe.

Manual prompt rewrite cycle days
Driftless validation loop minutes
Unscoped code edits 0

Evaluate the real workflow

Run the repo command your team already trusts, with model overrides and deterministic tuning and holdout splits.

Repair only what is scoped

Prompt, example, and config files can be edited. Schemas and product logic stay untouched unless you opt in.

Publish a reviewer packet

Every pull request includes the attempt log, metric table, holdout decision, and diff rationale.

The workflow

Deprecation notices become migration evidence.

When a provider retires a model, Driftless compares the current and target behavior, clusters failures, repairs the prompt, and validates the result before opening a PR.

01 Scan

Find model IDs, owners, and affected workflows.

02 Compare

Run baseline and target model through the same eval harness.

03 Repair

Patch scoped prompt files against clustered failures.

04 Validate

Accept only changes that pass holdout and thresholds.

Migration report PASS
MetricCurrentNaiveFinal
F10.920.840.99
Refusals1.8%8.7%1.2%
Schema errors4.1%12.0%0.0%
Holdoutrequiredfailedpassed

Reviewer trust

Not just a better score. A reason to merge.

Driftless shows what broke, what changed, how the candidate performed against the untouched holdout set, and where reviewers should look first.

"Treat prompt migrations like dependency upgrades: measurable, scoped, repeatable, and reviewable."

Driftless operating model

Automation

Use the same CLI locally and in CI.

Start with one workflow. Once the contract is stable, schedule scans for deprecated models or trigger refinements when eval labels change.

pip install "driftless @ git+https://github.com/driftless/driftless@main"

driftless scan
driftless migrate -w support_classifier --to gpt-4o-mini
driftless validate -w support_classifier

Start with the workflow that scares you most

Give reviewers the evidence before model drift reaches production.