Metadata-Version: 2.4
Name: claude-code-context-profiler
Version: 0.0.1
Summary: PreCompact ledger and per-component context profiler for Claude Code
Project-URL: Homepage, https://github.com/ks7585/claude-code-context-profiler
Project-URL: Repository, https://github.com/ks7585/claude-code-context-profiler
Project-URL: Issues, https://github.com/ks7585/claude-code-context-profiler/issues
Author-email: Karthik Sivarama Krishnan <ks7585@g.rit.edu>
License: MIT
License-File: LICENSE
Keywords: agent-tooling,anthropic,claude,claude-code,compaction,context-engineering,context-management,llm,llm-agents,prompt-engineering
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.40
Requires-Dist: pydantic>=2.7
Requires-Dist: rich>=13.7
Requires-Dist: tiktoken>=0.7
Requires-Dist: typer>=0.12
Provides-Extra: bench
Requires-Dist: datasets>=2.20; extra == 'bench'
Requires-Dist: matplotlib>=3.9; extra == 'bench'
Requires-Dist: pandas>=2.2; extra == 'bench'
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-cov>=5; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Description-Content-Type: text/markdown

# claude-code-context-profiler (`ccprofile`)

[![ci](https://github.com/ks7585/claude-code-context-profiler/actions/workflows/ci.yml/badge.svg)](https://github.com/ks7585/claude-code-context-profiler/actions/workflows/ci.yml)
[![python](https://img.shields.io/badge/python-3.11%20%7C%203.12-blue.svg)](https://www.python.org/downloads/)
[![license](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

**A local PreCompact ledger and per-component context profiler for
Claude Code.**

> Stop Claude Code from forgetting file paths, error codes, and
> decisions after auto-compaction.

`ccprofile` is a local hook bundle and CLI for Anthropic's Claude Code
that intervenes before auto-compaction runs. It extracts a structured
*session ledger* — files read, files modified, errors seen, user
preferences — and re-injects it as a persistent system note that
survives compaction.

It also ships `ccprofile profile`: a `ccusage`-style attribution view of
where your 200K-token context is going right now, with size × staleness
scored eviction candidates.

## Headline

On 12 real Claude Code sessions, the structured ledger **beats a real
Anthropic-Haiku-driven compaction summary by +33.5 pp [+24.5, +46.1]
overall recall** at the same token budget. 95% CI excludes zero across
five paired comparisons. See [Benchmark results](#benchmark-results) for
the full table and [`docs/benchmark-v0.md`](docs/benchmark-v0.md) for
the report.

## Quickstart

```bash
# From PyPI (CLI is named `ccprofile`; the unrelated `ccprofile` PyPI
# package is something else — install the distribution name shown here):
pip install claude-code-context-profiler

# Or from source:
git clone https://github.com/ks7585/claude-code-context-profiler.git
cd claude-code-context-profiler
python -m venv .venv && . .venv/bin/activate
pip install -e .

# Where is your context going right now?
ccprofile profile

# Install the PreCompact hook (preview first; --apply writes settings.json
# with a timestamped backup):
ccprofile install
ccprofile install --apply

# Reproduce the benchmark on your own ~/.claude/projects/ corpus:
ccprofile bench run --sweep-k 50,100,200,500,1000 --bootstrap-iters 10000
```

## The `profile` view

![ccprofile profile screenshot](docs/assets/profile-example.svg)

## Why

Anthropic's native auto-compaction is a token-saving win, but it
routinely loses the things engineers care most about: specific file
paths, error codes, decisions, and "approaches we already ruled out."
Users work around this with `/compact preserve the coding patterns we
established`, custom CLAUDE.md notes, and external memory plugins. None
of those are measured.

`ccprofile` does two things:

1. **Intervention.** A `PreCompact` hook extracts the structured ledger
   before compaction sees the conversation, so compaction's lossy
   summarizer operates on less noise *and* the ledger is re-injected
   after compaction completes.
2. **Measurement.** A paired-run benchmark over real Claude Code session
   transcripts compares oracle / simulated baseline / real-API baseline
   / ledger conditions with lexical and LLM-graded semantic probes and
   percentile-bootstrap CIs.

## What it isn't

- Not a memory framework. Cross-session memory is solved (Anthropic's
  memory tool, claude-mem, mem0, Letta). `ccprofile` operates *inside
  one session*.
- Not a generic observability platform. Langfuse and ccusage already do
  that.
- Not a competitor to native compaction. It complements it.

## Benchmark results

Measured on 12 real Claude Code sessions under five evaluation
conditions, with 10,000-resample percentile bootstrap CIs (sessions
paired across conditions). Full report:
[`docs/benchmark-v0.md`](docs/benchmark-v0.md). Harness:
[`bench/README.md`](bench/README.md).

| Condition | Overall recall ± 95% CI | file_read | file_modified | error_class | Avg context tokens |
|---|---|---|---|---|---:|
| `oracle` (no compaction) | 99.0% [98.3, 100.0] | 100.0% [100.0, 100.0] | 100.0% [100.0, 100.0] | 94.9% [91.9, 100.0] | 114,389 |
| `baseline` (simulated lossy compaction) | 28.1% [12.7, 67.6] | 35.3% [13.2, 73.6] | 13.8% [5.8, 53.3] | 47.5% [26.4, 77.8] | 25,724 |
| `real_baseline` (Claude-Haiku-driven compaction) | 29.1% [13.2, 68.5] | 36.2% [13.7, 74.7] | 15.2% [6.3, 54.8] | 47.5% [26.4, 77.8] | 26,606 |
| **`ledger`** (synthetic baseline + ccprofile) | **62.6% [43.5, 98.8]** | **84.5% [70.0, 100.0]** | **45.7% [24.4, 100.0]** | **59.3% [32.2, 95.6]** | 26,728 |
| **`real_ledger`** (real baseline + ccprofile) | **62.6% [43.5, 98.8]** | **84.5% [70.0, 100.0]** | **45.7% [24.4, 100.0]** | **59.3% [32.2, 95.6]** | 27,611 |

**Headline deltas (paired bootstrap, 95% CI):**

| Comparison | Δ overall | What it means |
|---|---|---|
| `ledger` − `baseline` | **+34.5 pp [+24.6, +49.3]** | Ledger vs the simulated lower bound |
| `real_ledger` − `real_baseline` | **+33.5 pp [+24.5, +46.1]** | Ledger vs a real Anthropic-style auto-compactor |
| `baseline` − `real_baseline` | −1.0 pp [−4.1, 0.0] | Synthetic vs real summary — statistically equivalent |
| `ledger` − `real_baseline` | **+33.5 pp [+24.5, +46.1]** | Ledger beats the real compactor |

All four ledger-vs-non-ledger CIs **exclude zero**. The "but a real
compactor would close this gap" objection is empirically refuted: at
the same token budget, a model-driven summary recovers only ~1 pp more
facts than the synthetic header.

**Robustness:** ledger wins across all five `keep_last_k` settings
swept (K ∈ {50, 100, 200, 500, 1000}). Even at K=1000, the delta CI is
+24.8 pp [+10.4, +33.5].

**Cost:** ledger overhead ~1,005 tokens (3.9% of baseline context).
Extraction latency: ~41 ms per session.

### Semantic probe (LLM-graded)

Per fact, `claude-haiku-4-5-20251001` is asked "did the agent X earlier
in this session? YES / NO" against each condition's post-compaction
context. Positive questions come from ground truth; negatives from a
deterministic distractor pool.

| Category | `baseline` | `real_baseline` | `ledger` | `real_ledger` |
|---|---:|---:|---:|---:|
| file_read | 35.8% | 37.7% | **86.8%** | **83.0%** |
| file_modified | 51.2% | 41.5% | **78.0%** | **78.0%** |
| error_class | 51.3% | 43.6% | **66.7%** | **71.8%** |
| **aggregate** | **45.1%** | **40.6%** | **78.2%** | **78.2%** |
| **specificity** | 100% | 100% | 100% | 100% |

**Two-probe convergence:** lexical and semantic methodologies agree on
the same effect, at similar magnitude. **Specificity = 100% on every
cell** — the model never accepts a distractor, confirming the YES
answers are grounded in the context rather than the model's priors.

A notable finding: on `file_modified` and `error_class`, the
`real_baseline` semantic recall is *lower* than the synthetic baseline.
The model compactor writes generic prose ("the agent fixed several
errors") that obscures specific facts, while the synthetic header at
least preserves counts. The structured ledger preserves exact strings,
which a downstream LLM can recover.

### Reproduce on your own session corpus

```bash
# Lexical-only headline (no API key needed):
ccprofile bench run \
    --sweep-k 50,100,200,500,1000 \
    --bootstrap-iters 10000 --seed 0

# Add the LLM-graded semantic probe:
ccprofile bench run \
    --semantic-probe --probe-concurrency 2 \
    --bootstrap-iters 10000 --seed 0

# Add the real Anthropic-API-driven compaction baseline
# (requires Anthropic tier 2 — input is ~150K tokens per session):
ccprofile bench run \
    --semantic-probe --real-baseline \
    --exclude <current_session_id> \
    --bootstrap-iters 10000 --seed 0
```

## Documentation

- [`docs/benchmark-v0.md`](docs/benchmark-v0.md) — the full
  measurement report (numbers above + per-session breakdown +
  methodology).
- [`docs/architecture.md`](docs/architecture.md) — hook flow, on-disk
  layout, guardrails.
- [`docs/install.md`](docs/install.md) — installation procedure for
  the PreCompact hook.
- [`bench/README.md`](bench/README.md) — harness internals and
  reproducibility notes.
- [`CONTRIBUTING.md`](CONTRIBUTING.md) — dev setup, scope of project,
  how to extend.
- [`SECURITY.md`](SECURITY.md) — private reporting channel.
- [`CHANGELOG.md`](CHANGELOG.md) — versioned release notes.

## Status

Pre-alpha but functional.

PyPI distribution name: `claude-code-context-profiler`. The short
package name `ccprofile` on PyPI is held by an unrelated single-release
"Claude Code permission profile manager" project and is not this
package.

## License

MIT.
