interp-lab attribution graph

the next token should be a physical measurement unit

Model: distilgpt2

Graph

Candidate Paths

Strong Features

Feature Cards

Agent Next Actions