Pytifex — Automated Differential Testing for Python Type Checkers

Author

Benedek Kaibas

Published

April 8, 2026

1 Pytifex — Automated Differential Testing for Python Type Checkers

GitHub Python 3.12+ License: MIT

Pytifex automatically discovers disagreements between Python type checkers by mining real bugs from type checker repositories, generating targeted test cases with an LLM, and establishing ground truth through multi-tiered evaluation.

1.1 How It Works

 Mine bugs from        Generate code         Run 4 type           Evaluate which
 GitHub issues    →    variations via    →    checkers on      →   checker is
 (mypy, ty, ...)       Gemini LLM            each example         correct
  1. Mine — Fetch real bug reports (false positives, false negatives) from mypy, pyrefly, ty, and pyright GitHub repositories
  2. Mutate — Use the bugs as seeds for Gemini to generate new code targeting similar edge cases
  3. Test — Run mypy, pyrefly, zuban, and ty on each generated example; keep only disagreements
  4. Evaluate — Determine which checker is correct using runtime crash detection, Hypothesis testing, PEP spec matching, and AST analysis

1.2 Type Checkers Tested

Checker Version
mypy 1.19.0
pyrefly 0.44.2
zuban 0.3.0
ty 0.0.1-alpha.32

1.3 Quick Start

git clone https://github.com/benedekaibas/pytifex-demo.git
cd pytifex-demo/src/tc_disagreement

export GEMINI_API_KEY=your_key

uv run main.py --num-examples 5

Note: Pytifex is a research tool developed for a senior comprehensive project. It implements a bug-seeded mutation methodology for proactively finding type checker bugs before users encounter them.

1.4 Documentation