Metadata-Version: 2.4
Name: ai-code-stats
Version: 0.1.1
Summary: Measure CodingAgent (Claude Code / Codex) code adoption, AI-authored lines, and token usage by git repository and committer
Author: ai-code-stats
License-Expression: MIT
Keywords: claude-code,codex,git,metrics,ai-coding
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Provides-Extra: http
Requires-Dist: requests>=2.25; extra == "http"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: jsonschema>=4.0; extra == "dev"

# ai-code-stats

Measure how much code from CodingAgents such as **Claude Code** and **Codex** is
accepted into git commits, including AI-authored line counts, adoption rate, AI
share of each commit, and token usage. Metrics are reported by **git repository
x committer** on every commit.

Report delivery is pluggable: HTTP webhook, local JSONL files, or a custom
command. All payloads are defined by versioned JSON Schemas and work across
macOS, Windows, and Linux.

> Full installation, configuration, reporting, and troubleshooting details are
> in [docs/USAGE.md](docs/USAGE.md). Chinese documentation is available in
> [README_zh.md](README_zh.md).

## What It Answers

- How many AI-generated lines were present in a commit, and how many were
  ultimately accepted.
- Total added lines, AI-added lines, and AI share of each commit, using both
  raw and effective-code counting modes.
- Token usage associated with the commit, including input, output, and cache
  read tokens.

## How It Works

```text
 AI edit (PostToolUse hook)             git commit (post-commit / post-merge hook)
 +----------------------------+         +------------------------------------------+
 | Parse Edit/Write/apply_patch|        | Read commit diff, including renames       |
 | Added lines -> normalize+hash| ----> | Match AI fingerprints within the window   |
 | Mark effective-code lines   | pending| Compute adoption, AI share, and tokens    |
 | Store under .git/ai-code-stats/      | Build JSON envelope -> dispatch reporters |
 +----------------------------+         +------------------------------------------+
```

- **Adoption rate** = AI lines accepted into the commit / AI lines generated in
  the attribution window.
- **AI share** = commit added lines matched to AI fingerprints / total commit
  added lines.
- Matching is based on normalized content hashes, so moved or renamed AI code
  can still be attributed.

## Installation

Requires Python 3.9 or newer and git.

```bash
pip install ai-code-stats          # or: pip install -e . for development

# Run from the target repository root. Installs git + Claude + Codex hooks.
ai-code-stats install

# Install only selected hooks, or preview without writing files.
ai-code-stats install --git
ai-code-stats install --claude --scope user      # writes ~/.claude/settings.json
ai-code-stats install --codex --dry-run

# Uninstall. This is idempotent and preserves your own hook content.
ai-code-stats uninstall
```

> Codex hooks are written to `$CODEX_HOME/config.toml`, defaulting to
> `~/.codex/config.toml`. Because the Codex hook schema is still evolving, run
> `ai-code-stats install --codex --dry-run` after installation and confirm that
> your Codex version supports inline `[[hooks.PostToolUse]]` entries.

## Configuration

Configuration is merged in this order, with later sources overriding earlier
ones: built-in defaults, user-level `config.json`, repository
`.ai-code-stats.json`, then the file pointed to by `AI_CODE_STATS_CONFIG`.
String values support `${ENV:VAR}` placeholders for secret injection.

```jsonc
{
  "enabled": true,
  "privacy": {
    "store_plaintext": true,     // Keep AI line plaintext locally under .git/.
    "redact_in_reports": true    // Reports include metrics only, not source code.
  },
  "files": {
    "include": [],               // Empty = known code extensions; non-empty = only matching files.
    "exclude": ["**/node_modules/**", "**/*.min.js", "package-lock.json"]
  },
  "attribution": {
    "count_modes": ["raw", "effective"],
    "primary": "effective",      // Primary metric uses the effective-code mode.
    "merge_strategy": "skip",    // Merge commits: skip or first_parent.
    "detect_renames": true
  },
  "reporters": [
    { "type": "json_file", "path": "{repo_data}/reports.jsonl" },
    { "type": "http_webhook",
      "url": "https://metrics.example.com/ingest",
      "headers": { "Authorization": "Bearer ${ENV:AI_CODE_STATS_TOKEN}" },
      "mapping": {                // Map the envelope into any backend schema.
        "repo": "data.repo_id",
        "rate": "data.ai.effective.adoption_rate",
        "tokens": "data.tokens.total"
      }
    },
    { "type": "command", "argv": ["my-forwarder"] }  // Envelope JSON is passed via stdin.
  ]
}
```

### Counting Modes

- **raw**: all added and removed lines.
- **effective**: blank lines and pure comment lines are excluded based on the
  file language's comment syntax.

### File Filtering

By default, ai-code-stats counts files with known code-language extensions and
excludes lock files, generated artifacts, vendored directories, and binaries.
Use `files.include` and `files.exclude` glob patterns, including `**`, to
customize this behavior.

## Data Contract

The `schemas/` directory contains three versioned JSON Schemas:

| Schema | Purpose |
|--------|---------|
| `ai_edit_event.schema.json` | A single locally buffered AI edit event |
| `commit_stat.schema.json` | Complete statistics for one commit |
| `report_envelope.schema.json` | Common reporting envelope |

Example envelope:

```json
{
  "schema_version": "1.0",
  "kind": "commit_stat",
  "produced_at": "2026-06-15T08:00:00Z",
  "producer": { "plugin": "ai-code-stats", "version": "0.1.1", "os": "darwin" },
  "data": {
    "repo_id": "github.com/org/repo",
    "commit": { "sha": "...", "branch": "main", "is_merge": false },
    "committer": { "name": "Dev", "email": "dev@x.com" },
    "totals": { "files_changed": 2, "raw": { "lines_added": 5 }, "effective": { "lines_added": 3 } },
    "ai": {
      "raw":       { "ai_lines_added": 4, "adoption_rate": 1.0, "ai_share_of_commit": 0.8 },
      "effective": { "ai_lines_added": 3, "adoption_rate": 1.0, "ai_share_of_commit": 1.0 }
    },
    "tokens": { "input": 120, "output": 30, "total": 150 }
  }
}
```

## Common Commands

```bash
ai-code-stats status              # Show pending attribution events and token snapshots.
ai-code-stats report              # Print the current HEAD envelope without sending or consuming it.
ai-code-stats flush               # Retry failed report deliveries.
```

## Privacy

- AI line plaintext is stored only under the repository's
  `.git/ai-code-stats/` directory and is never committed.
- Reports default to `redact_in_reports=true`, so they contain metrics only and
  no source code.
- For stronger privacy, set `privacy.store_plaintext=false`; local storage will
  keep only hashes.

## Known Limitations

- Merge commits skip attribution by default to avoid noisy diffs. Set
  `merge_strategy=first_parent` if you want first-parent merge attribution.
- Adoption rate is approximate under `rebase`, `cherry-pick`, and
  `commit --amend`.
- Token attribution is estimated from the session-level cumulative delta since
  the previous commit, so concurrent work across multiple repositories can be
  approximate.

## Development

```bash
PYTHONPATH=src python3 -m pytest
PYTHONPATH=src python3 -m ai_code_stats.cli --help
```

Architecture overview: `agents/` adapts CodingAgent payloads, `classify`
filters and classifies lines, `attribution` computes matches, `tokens`
aggregates token usage, `reporters/` provides pluggable delivery, `githook/`
computes commit statistics, and `install/` manages hook installation.

To add a reporter, implement `reporters/base.Reporter` and register it in
`reporters/registry.REPORTER_TYPES`. To add an Agent, implement
`agents/base.AgentAdapter` and register it in `agents/registry`.
