Your AI activity ledger — every agent, every activity, 100% local.
Every session is categorized. Each row shows total cost, total human-equivalent value, and ROI — using mid estimates of human time and market rate, multiplied by the quality factor. Hover the columns for low/high ranges.
tokenpayback auto-detects Claude Code, Codex CLI, Hermes, OpenClaw, OpenHuman, and Cursor on your machine.
| Agent | Sessions | $ spent | est. value | ROI |
|---|
| Week | Cost | Value | ROI | PRs | Commits | +Lines | Reverts |
|---|
Sessions with ROI < 1× (AI cost more than the human time it saved) or with
failed / harmful quality. Look at these before declaring victory.
A negative-ROI session burned tokens AND added cleanup work for you — those are the cases that
should make you change your prompt, your tool choice, or your task framing.
| Date | Agent | Project | Category | What happened | Quality | $ AI cost | $ Value | ROI |
|---|
Every row shows the closest professional role, a time range (low — mid — high) for that human, the market rate range for that role, and the quality factor reflecting whether the AI output was directly usable. Final value = mid_hours × mid_rate × quality_factor; the range comes from combining the low/high estimates. Nothing leaves your machine.
| Date | Agent | Via | Category | Project | Equivalent role | Human time low → mid → high |
Hourly rate low → mid → high |
Quality ×factor |
$ AI cost | $ Human-equiv low — mid — high |
ROI low — mid — high |
|---|
Per session, the LLM picks the closest professional role that would do this work and estimates hours + market rate (anchored to your benchmark: us-west / senior). Value = hours × rate × quality_factor. Quality factor: full-replacement 1.0 · with-edits 0.7 · draft-only 0.4 · failed 0.0.
Estimates are uncertain by design — that's why every number is a range, not a single value.
Tune the benchmark in ~/.tokenpayback/config.yaml (region + seniority).