Metadata-Version: 2.4
Name: arbiter-dev
Version: 0.2.0
Summary: Agent-aware code quality system for multi-agent codebases
Author: Reuben Bowlby, Daniel Matha
License-Expression: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: analyzers
Requires-Dist: ruff>=0.4.0; extra == "analyzers"
Requires-Dist: radon>=6.0; extra == "analyzers"
Requires-Dist: vulture>=2.10; extra == "analyzers"
Requires-Dist: bandit>=1.7.0; extra == "analyzers"
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"
Provides-Extra: all
Requires-Dist: arbiter[analyzers,test]; extra == "all"
Dynamic: license-file

# Arbiter

**Agent-aware code quality system for multi-agent codebases.**

In 2026, code is written by fleets of AI agents. Arbiter knows *who* wrote each line — human or AI — and scores quality accordingly.

## What Makes Arbiter Different

| Feature | Traditional Tools | Arbiter |
|---------|------------------|---------|
| Agent attribution | None | First-class: tracks Claude, Codex, Gemini, Copilot, humans |
| Per-commit scoring | Repo-wide only | Scores each commit's changed files individually |
| Diff analysis | N/A | Score only what changed in a PR/branch |
| Transparency | Opaque score | Every score decomposes into lint + security + complexity |
| Agent-specific gates | N/A | Different quality thresholds per agent trust tier |
| Tool integration | Proprietary | Wraps tools you already trust: ruff, Bandit, radon, vulture |
| Dashboard | SaaS login | Single HTML file with per-agent timelines, commit feed, fleet view |
| Dependencies | Heavy | Analysis tools only; core is stdlib Python |

## Quick Start

```bash
git clone https://github.com/hummbl-dev/arbiter.git
cd arbiter

# Install (makes `arbiter` command available)
pip install ".[analyzers]"

# Quick score (no persistence)
arbiter score /path/to/your/repo

# Full analysis with per-commit agent attribution
arbiter analyze /path/to/your/repo

# Score only files changed since main
arbiter diff /path/to/your/repo --base main

# Agent leaderboard
arbiter agents

# Start dashboard
arbiter serve --port 8080
# Open http://localhost:8080
```

### Without install (PYTHONPATH)

```bash
PYTHONPATH=src python -m arbiter score /path/to/your/repo
```

### With Docker

```bash
docker build -t arbiter .
docker run -p 8080:8080 -v /path/to/repo:/repo:ro arbiter
```

## Architecture

```
Git Repo ──→ [Git Historian] ──→ [Analyzer Runner] ──→ [Scoring Engine] ──→ [SQLite Store]
                  │                      │                     │                    │
           agent attribution      tool invocation        weighted rubric       trend data
           (Co-Authored-By,       (ruff, radon,          (lint 35%,             │
            email matching)        vulture, bandit)        security 30%,        ├──→ REST API
                                                           complexity 35%)     └──→ Dashboard
             ┌────────────┐
             │Diff Analyzer│ ←── v0.2: scores only changed files per commit/branch
             └────────────┘
```

### Per-Commit Scoring (v0.2)

Every commit is scored against only the files it changed, not the entire repo. This makes the agent leaderboard meaningful — a commit that touches 1 clean file scores differently than one that touches 10 messy files.

### Diff Mode (v0.2)

`arbiter diff` scores only files changed since a base branch. Ideal for CI/PR quality gates — fast, scoped, actionable.

### Agent Attribution

Arbiter identifies which agent authored each commit:

1. **Co-Authored-By trailer** — `Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>`
2. **Author email** — maps `noreply@anthropic.com` → claude, `codex@openai.com` → codex
3. **Default** — "human" if no agent pattern matches

Configure in `agents.yml`:
```yaml
agents:
  - name: claude
    emails: [noreply@anthropic.com]
    co_author_patterns: ["Claude\\s+(Opus|Sonnet|Haiku)"]
    trust_tier: verified
    quality_threshold: 70.0
  - name: gemini
    trust_tier: probation
    quality_threshold: 80.0  # Higher bar for probationary agents
```

### Analyzers (pluggable)

| Analyzer | Tool | What It Finds |
|----------|------|--------------|
| Lint | ruff | Style violations, import errors, bugbear patterns |
| Complexity | radon | Cyclomatic complexity (grade A-F per function) |
| Security | bandit | Hardcoded secrets, shell injection, dangerous patterns |
| Dead Code | vulture | Unused functions, imports, variables |
| Duplication | AST hash | Near-duplicate function bodies |

### Scoring

Deterministic. Same code → same score. Always.

```
Overall = Lint (35%) + Security (30%) + Complexity (35%)

Penalty points by severity:
  CRITICAL: 50 | HIGH: 20 | MEDIUM: 5 | LOW: 1

Score = 100 - (total_penalty / LOC) * normalization_factor
```

Grades: A (90+) | B (80+) | C (70+) | D (60+) | F (<60)

### Dashboard (v2)

Single HTML file with Chart.js. No build step, no React, no npm.

- **Score Card** — Big number + breakdown bars
- **Agent Leaderboard** — Who writes the best code? Color-coded by agent
- **Per-Agent Quality Timeline** — Score over time per agent (not just repo-wide)
- **Commit Feed** — Recent commits with agent, score, changes, timestamp
- **Hotspot Files** — Ranked by finding count
- **Fleet View** — Multi-repo quality grid with color-coded scores
- **Tabbed UI** — Overview, Commits, Fleet tabs

### API

```
GET /api/score                  Current repo score
GET /api/agents                 Agent leaderboard
GET /api/agents/{name}/trend    Per-agent quality over time
GET /api/trend?days=30          Quality over time
GET /api/worst?limit=20         Worst files
GET /api/commits                Recent commits with scores
GET /api/commits/{hash}         Detail for one commit
GET /api/fleet                  Fleet report (multi-repo)
GET /api/health                 System health
```

## CLI Commands

```bash
arbiter analyze <repo>                     # Full analysis + per-commit scoring + persist
arbiter score <repo> [--json] [--exclude]  # Quick score (no persist)
arbiter diff <repo> [--base main] [--json] # Score only changed files vs base branch
arbiter agents                             # Agent leaderboard
arbiter trend [--days 30]                  # Quality trend
arbiter worst [--limit 20]                 # Worst files
arbiter commits [--agent claude]           # Recent commits
arbiter audit-fleet <directory>            # Audit all repos in a directory
arbiter fleet-report                       # Fleet quality summary
arbiter triage                             # Auto-classify repos: green/yellow/red/archive
arbiter fix <repo> [--dry-run]             # Auto-fix ruff findings + before/after score
arbiter serve [--port 8080]                # API + dashboard
```

## Tests

```bash
pip install ".[test]"
PYTHONPATH=src python -m pytest tests/ -v
# 78 tests, <7 seconds
```

## Requirements

- Python 3.11+
- git (for historian)
- Optional: ruff, radon, vulture, bandit (for full analysis)
- Docker (for containerized deployment)

## License

MIT — see [LICENSE](LICENSE).

---

Built by [HUMMBL LLC](https://hummbl.io) from production experience coordinating Claude, Codex, Gemini, and human engineers on a 6,000+ test codebase.
