Metadata-Version: 2.4
Name: github-repo-auditor
Version: 0.1.3
Summary: Automated GitHub portfolio auditor with 12 analysis dimensions
Author: Saag Patel
License-Expression: MIT
Project-URL: Homepage, https://github.com/saagpatel/GithubRepoAuditor
Project-URL: Repository, https://github.com/saagpatel/GithubRepoAuditor
Project-URL: Bug Tracker, https://github.com/saagpatel/GithubRepoAuditor/issues
Project-URL: Changelog, https://github.com/saagpatel/GithubRepoAuditor/blob/main/CHANGELOG.md
Project-URL: Documentation, https://github.com/saagpatel/GithubRepoAuditor#readme
Keywords: github,portfolio,audit,repository-analysis,developer-tools
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Utilities
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31.0
Requires-Dist: python-dateutil>=2.8.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: radon>=6.0.0
Requires-Dist: rich>=13.0
Requires-Dist: anthropic>=0.40.0
Requires-Dist: fpdf2>=2.7.0
Requires-Dist: httpx>=0.27
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Requires-Dist: mutmut>=2.5; extra == "dev"
Requires-Dist: responses>=0.25; extra == "dev"
Requires-Dist: types-PyYAML>=6.0; extra == "dev"
Provides-Extra: config
Requires-Dist: pyyaml>=6.0; extra == "config"
Provides-Extra: semantic
Requires-Dist: sqlite-vec>=0.1.0; extra == "semantic"
Requires-Dist: sentence-transformers>=3.0; extra == "semantic"
Provides-Extra: serve
Requires-Dist: fastapi>=0.115; extra == "serve"
Requires-Dist: uvicorn[standard]>=0.30; extra == "serve"
Requires-Dist: jinja2>=3.1; extra == "serve"
Requires-Dist: python-multipart>=0.0.9; extra == "serve"
Provides-Extra: build
Requires-Dist: shiv>=1.0; extra == "build"
Requires-Dist: build>=1.0; extra == "build"
Requires-Dist: twine>=5.0; extra == "build"
Dynamic: license-file

# GitHub Repo Auditor

[![Python](https://img.shields.io/badge/Python-%233776ab?style=flat-square&logo=python)](#) [![License](https://img.shields.io/badge/license-MIT-blue?style=flat-square)](#) [![Tests](https://img.shields.io/badge/tests-covered-brightgreen?style=flat-square)](#)

> Know the truth about every project you've ever started — because `git log` across 100 repos doesn't tell you which ones are worth finishing.

GitHub Repo Auditor is a portfolio audit and operator tool for developers with a lot of repositories. It clones every repo on your GitHub account, runs 12 analyzers across completeness and interest dimensions, assigns letter grades and achievement badges, preserves historical state, and generates actionable dashboards you can actually use to decide what to work on next. Built for developers who ship fast, start often, and need a system to manage the sprawl.

Today the project is best understood as a GitHub portfolio operating system:

- it tells you which repos are healthy, drifting, blocked, or safe to ignore for now
- it gives you one workbook-first weekly review flow instead of a pile of disconnected reports
- it tracks whether recommended follow-through is actually happening and whether that improvement is holding up over time
- it keeps JSON, Markdown, HTML, workbook, and control-center outputs aligned so you do not have to switch mental models between surfaces

## What This Project Is Today

This project started as a repo auditing tool and has grown into a workbook-first GitHub portfolio operating system.

Today it:

- audits repositories across documentation, testing, CI, dependencies, activity, security, structure, community profile, completeness, and interest signals
- scores repos on dual axes, classifies them into useful tiers, and surfaces quick wins
- generates aligned JSON, Markdown, HTML, workbook, review-pack, and control-center outputs from the same audit facts
- writes a report-only weekly command-center digest beside the control-center artifact so paused automation can consume one bounded summary instead of stale notes
- generates a canonical workspace-level portfolio truth snapshot for a local projects folder and derives the shared registry/report compatibility artifacts from it
- preserves historical state in SQLite so the operator loop can show change, regression, recovery, and follow-through
- keeps the workbook and `--control-center` as the main day-to-day operating surfaces

If you are new here, the simplest way to think about it is: this project tells you which repos are healthy, which ones are drifting, and what to look at next.

## Mode Map

The product now works best when you use one of four explicit modes:

- `First Run` for setup, baseline creation, the first workbook, and the first control-center read
- `Weekly Review` for the normal ongoing operator loop
- `Deep Dive` for repo-level investigation and implementation hotspots
- `Action Sync` for campaigns, writeback, GitHub Projects, and Notion mirroring

The flags stay the same underneath. The modes are the easiest way to understand when to use which workflow.

See [docs/modes.md](docs/modes.md) for the canonical mode guide.

## Recommended Default Path

If you are starting fresh, use this sequence:

```bash
audit run <github-username> --doctor
audit run <github-username> --html
audit triage <github-username> --control-center
```

Then open the workbook and read it in this order:

- `Dashboard`
- `Run Changes`
- `Review Queue`
- `Portfolio Explorer`
- `Repo Detail`
- `Executive Summary`

That is the default path the product is optimized around.

## Commands By Mode

### First Run

```bash
audit run <github-username> --doctor
audit run <github-username> --html
audit triage <github-username> --control-center
```

### Weekly Review

```bash
audit run <github-username> --html
audit triage <github-username> --control-center
audit report <github-username> --portfolio-truth
```

### Deep Dive

```bash
audit run <github-username> --repos <repo-name> --html
audit triage <github-username> --control-center
```

### Action Sync

```bash
audit report <github-username> --campaign security-review --writeback-target github
audit report <github-username> --campaign security-review --writeback-target all --github-projects
audit triage <github-username> --approval-center
```

Treat campaign/writeback, GitHub Projects, Notion sync, catalog overrides, scorecards overrides, and `--excel-mode template` as advanced paths.

## Demo and Guides

- Safe demo path: run `make demo` after a local clone to generate sample artifacts from the committed fixture without a GitHub token.
- Demo fixture: [fixtures/demo/sample-report.json](fixtures/demo/sample-report.json)
- Product modes: [docs/modes.md](docs/modes.md)
- Web UI operator guide: [docs/audit-serve.md](docs/audit-serve.md)
- CLI migration (flat → subcommand): [docs/audit-cli-migration.md](docs/audit-cli-migration.md)
- Weekly operator workflow: [docs/weekly-review.md](docs/weekly-review.md)
- Operator troubleshooting: [docs/operator-troubleshooting.md](docs/operator-troubleshooting.md)
- Workbook tour: [docs/workbook-tour.md](docs/workbook-tour.md)
- Extending analyzers: [docs/extending-analyzers.md](docs/extending-analyzers.md)
- Release gates: [docs/release-gates.md](docs/release-gates.md)
- Distribution status: [docs/distribution.md](docs/distribution.md)
- Project history: [docs/project-history.md](docs/project-history.md)

## Features

- **12 Analyzers** — README quality, test coverage, CI/CD, dependency freshness, commit patterns, bus factor, code complexity, security controls, license, build readiness, GraphQL signals, and more
- **Dual-Axis Scoring** — Completeness (does this project have what shipped software should?) and Interest (is this worth anyone's time?) scored independently on 0.0–1.0 scales
- **Letter Grades + Tier Classification** — A–F grades with Shipped / Functional / WIP / Skeleton / Abandoned tiers; 15 achievement badges ("Fully Tested", "CI Champion", "Zero Debt", etc.)
- **Quick Wins Engine** — For each repo, shows exactly which single action moves it to the next tier and how far it is from getting there
- **Multiple Dashboard Outputs** — Flagship Excel workbook with a stable `standard` mode and optional `template` mode, interactive HTML dashboard with scatter chart and tech radar, portfolio README, shields.io badges
- **Workbook-First Operator Review** — Clear reading order through `Dashboard`, `Run Changes`, `Review Queue`, `Portfolio Explorer`, `Repo Detail`, and `Executive Summary`
- **Control Center Queue** — Read-only daily triage that groups work into `Blocked`, `Needs Attention Now`, `Ready for Manual Action`, and `Safe to Defer`
- **Follow-Through Story** — Tracks whether recommendations were untouched, attempted, waiting on evidence, stale, recovering, rebuilding, re-acquired, softening, or retired so the weekly review loop stays honest
- **Repo Drilldowns + Weekly Review Packs** — One-repo briefings and weekly summaries that mirror the same action story across Markdown, HTML, and workbook
- **Notion Integration** — Pushes audit signals into your Notion operating system: completeness cards, managed campaign records, and lifecycle-aware review sync
- **History & Regression Detection** — Archives every run to SQLite, auto-diffs between runs, detects score regressions, and flags archive candidates
- **AI Narrative** — Optional Claude-powered portfolio analysis that reads the audit data and writes a human-readable summary

## Quick Start

### Prerequisites

- Python 3.11+
- A GitHub account (public repos work without a token)
- `GITHUB_TOKEN` env var or `gh` CLI authenticated (for private repos and higher rate limits)

### Installation

The package is published on PyPI and through GitHub Releases. For normal CLI use,
install it as an isolated tool:

```bash
# uv (recommended)
uv tool install github-repo-auditor

# pipx
pipx install github-repo-auditor
```

Fastest no-clone path:

```bash
curl -LO https://github.com/saagpatel/GithubRepoAuditor/releases/latest/download/audit.pyz
chmod +x audit.pyz
./audit.pyz --help
```

Install from the public GitHub source when you want the latest unreleased code:

```bash
uv tool install 'git+https://github.com/saagpatel/GithubRepoAuditor.git'
pipx install 'git+https://github.com/saagpatel/GithubRepoAuditor.git'

# local editable clone
git clone https://github.com/saagpatel/GithubRepoAuditor.git
cd GithubRepoAuditor
pip install -e ".[config]"
```

The self-contained `.pyz` binary is also available from the
[GitHub Releases](https://github.com/saagpatel/GithubRepoAuditor/releases) page.
See [docs/distribution.md](docs/distribution.md) for the release and publishing policy.

For the local web UI, install the `[serve]` extra from source:

```bash
pip install "github-repo-auditor[serve]"
# or from a clone: pip install -e ".[serve]"
```

### Try the safe demo

The demo uses committed fixture data and writes only to `output/demo/`.

```bash
git clone https://github.com/saagpatel/GithubRepoAuditor.git
cd GithubRepoAuditor
pip install -e ".[config]"
make demo
```

Expected outputs include `output/demo/demo-report.json`,
`output/demo/demo-workbook.xlsx`, `output/demo/dashboard-*.html`,
`output/demo/operator-control-center-demo.json`, and
`output/demo/operator-control-center-demo.md`.

### Quick start (subcommand form)

```bash
audit run <user>                       # fetch, clone, analyze, score
audit triage <user> --control-center   # read-only operator queue
audit report <user> --portfolio-truth  # regenerate workspace truth layer
audit serve                            # open browser dashboard
```

The flat form (`audit <user> --html`) still works and prints a one-time deprecation
warning. It will not be removed until a future major version bump. See
[docs/audit-cli-migration.md](docs/audit-cli-migration.md) for the flag-family mapping.

### Daily flow

1. `audit serve` — start the local web UI at `http://127.0.0.1:8080/`
2. Browse to `/` for the portfolio dashboard; `/runs/new` to trigger a fresh audit
3. After the run completes, check `/repos/{name}` for per-repo drill-downs
4. Run `audit triage <user> --control-center` for the full operator queue in the terminal

### Common invocations

```bash
# Doctor mode — recommended first step
audit run <github-username> --doctor

# Weekly Review — generate the native workbook + HTML dashboard
audit run <github-username> --html

# Weekly Review — daily read-only triage from the latest state
audit triage <github-username> --control-center

# Portfolio Truth — regenerate the canonical workspace truth layer
audit report <github-username> --portfolio-truth

# Semantic search across the portfolio index
audit triage <github-username> --ask "Python projects with no tests"

# Weekly operator briefing (requires Anthropic API key)
audit run <github-username> --briefing

# Deep Dive — targeted repo rerun merged into the latest baseline
audit run <github-username> --repos <repo-name> --html

# Action Sync — managed campaign preview / writeback
audit report <github-username> --campaign security-review --writeback-target github
```

Normal runs perform a lightweight automatic preflight before fetching repos. By default
the run stops on blocking errors and continues on warnings. Use `--preflight-mode strict`
to fail on warnings too, or `--preflight-mode off` to skip the automatic preflight.

`audit triage --control-center` is read-only. It loads the latest report + warehouse
state, groups open work into `Blocked`, `Needs Attention Now`, `Ready for Manual Action`,
and `Safe to Defer`, and writes `operator-control-center-<username>-<date>.json` plus
`.md`.

`audit triage --approval-center` is also read-only. It loads the latest approval history,
groups work into `Needs Re-Approval`, `Ready For Review`, `Approved But Manual`, and
`Blocked`, and writes `approval-center-<username>-<date>.json` plus `.md`. Local approval
capture stays separate from writeback apply.

Watch mode supports `--watch-strategy adaptive|incremental|full`. `adaptive` is the
default and uses the stored baseline contract plus the scheduled full-refresh interval to
decide whether each watch cycle should run full or incremental.

For a full description of all flags grouped by workflow, see
[docs/modes.md](docs/modes.md).

### Run tests

```bash
pytest
```

## Development

For local development, clone the repo and install with the dev + config extras:

```bash
git clone https://github.com/saagpatel/GithubRepoAuditor.git
cd GithubRepoAuditor
pip install -e ".[dev,serve,semantic,config]"
```

Common dev commands:

```bash
python3 -m pytest -q -p no:cacheprovider   # full test suite
python3 -m ruff check src/ tests/          # lint
python3 -m ruff format src/ tests/         # format
make workbook-gate                         # workbook invariant check
make release-gate                          # mutation testing gate
```

See [docs/release-gates.md](docs/release-gates.md) for the full gate checklist.

## Tech Stack

| Layer | Technology |
|-------|------------|
| Language | Python 3.11+ |
| GitHub API | REST v3 + GraphQL (raw requests) |
| Excel output | openpyxl + committed workbook template |
| PDF output | fpdf2 |
| AI narrative | Anthropic Claude API |
| Complexity analysis | Radon |
| CLI output | Rich |
| Storage | SQLite (history warehouse) |

## Architecture

The auditor follows a pipeline architecture: fetch repo list via GitHub API → shallow-clone each repo → run all 12 analyzers in sequence → aggregate scores → generate outputs. Analyzers are pluggable via `--analyzers-dir` for custom extensions. The scoring engine computes completeness and interest independently, applies configurable scoring profiles, and derives letter grades from the combined result. All output writers (Excel, HTML, JSON, Markdown, Notion) are isolated from the analysis layer and consume the same scored result object. Workbook ranking and trend views always use the full filtered portfolio baseline, even for targeted or incremental reruns.

Partial reruns now require a compatible full-baseline report, not just any previous report. The stored baseline contract tracks the audit-affecting portfolio context used to produce the last trustworthy baseline, and targeted or incremental reruns will fail closed if that contract no longer matches the current request.

Before normal runs start, the CLI now performs a shared preflight that checks config validity, token/config readiness for requested integrations, template/workbook availability, output writability, and whether targeted or incremental paths have a usable baseline. `--doctor` runs the broader diagnostics set without auditing repos and writes a machine-readable JSON artifact to `output/diagnostics-<username>-<date>.json`.

For day-to-day operations, `--control-center` is now the clean read-only entrypoint. It reuses the latest report, review state, campaign history, governance drift, and setup health to build one shared operator queue without running a new audit or mutating any external system.

The portfolio truth layer now has its own dedicated generation path. `--portfolio-truth` scans the configured local projects workspace, produces `output/portfolio-truth-latest.json` plus dated historical truth snapshots, and regenerates the configured project-registry and portfolio-audit Markdown compatibility outputs from that same truth contract instead of treating either markdown file as canonical.

Phase 104 added a second standalone workspace mode: `--portfolio-context-recovery`. That mode freezes the active/recent weak-context cohort from the live truth snapshot, writes dry-run recovery plan artifacts into `output/`, skips dirty or temporary repos automatically, and can apply bounded minimum-context upgrades plus repo-level catalog seeds before regenerating the truth snapshot and compatibility outputs.

Watch mode now uses that same baseline contract in live execution. Each cycle records the requested watch strategy, the chosen mode, and the reason a full refresh was required or an incremental rerun remained safe.

`pyproject.toml` is the canonical dependency definition, and `requirements.txt` is kept as a synchronized compatibility mirror for environments that still prefer a flat requirements file.

## Excel Workbook

The workbook now supports two modes:

- `--excel-mode standard` — stable operational workbook path, the CLI default, and the recommended mode for automation and Mac Excel compatibility
- `--excel-mode template` — template-backed workbook path using `assets/excel/analyst-template.xlsx` for controlled template work

Both modes read from the same report + warehouse facts. Python owns the hidden `Data_*` sheets, stable table names, and workbook facts. The template-backed workbook still owns the template shell, named-range bindings, native sparkline placement, and print layout, but the standard workbook path is now the safest default for automated generation and Excel compatibility.

Template mode is also validated during preflight: the committed workbook asset must exist and pass a lightweight shell check before the run will continue.

This workbook boundary is unchanged in the current phase: the project still emits one workbook artifact, visible sheets remain filter-based, and hidden `Data_*` sheets remain the contract surface for workbook facts and downstream bindings.

The workbook’s main visible flow is now:
- `Index` for orientation
- `Dashboard` for the big-picture read
- `Run Changes` for what moved this run
- `Review Queue` for action
- `Portfolio Explorer` for comparison
- `Repo Detail` for one-repo drilldown
- `Executive Summary` for a one-page shareable readout

For workbook-facing changes, use the canonical release gate:

```bash
make workbook-gate
```

That command generates stable sample `standard` and `template` workbooks, validates the visible-sheet and hidden `Data_*` invariants, writes an authoritative `workbook-gate-result.json`, adds a human-readable gate summary, and produces a manual desktop Excel checklist. The final release step is still opening the generated `standard` workbook in desktop Excel and recording the local signoff outcome with `make workbook-signoff`.

After that manual desktop Excel check, record the outcome back into the gate artifacts:

```bash
make workbook-signoff ARGS="--reviewer <name> --outcome passed --check excel-open-no-repair=passed --check visible-tabs-present=passed --check normal-zoom-readable=passed --check chart-placement-clean=passed --check filters-work=passed"
```

## Managed Campaigns and Governance

Campaign writeback is now lifecycle-aware rather than one-shot:

- `--campaign-sync-mode reconcile` updates active managed records and closes stale ones
- `--campaign-sync-mode append-only` leaves stale managed records open and marks them stale
- `--campaign-sync-mode close-missing` aggressively closes previously managed records that no longer belong in the campaign

Managed state drift, rollback coverage, and campaign history are written into JSON, Markdown, HTML, Excel, and the warehouse snapshot. Governed security controls still remain manual and opt-in, but operator surfaces now distinguish ready, approved, applied, drifted, and rollback coverage states when governance data is present.

When writeback or governance-related actions are requested, preflight checks now validate the required GitHub and Notion prerequisites before any external mutation path starts.

## Operator Loop

The daily operator loop is now:

- Run `audit run <github-username> --doctor`
- Run `audit run <github-username>` or `audit run <github-username> --watch --watch-strategy adaptive`
- Run `audit triage <github-username> --control-center`
- Review the handoff fields: what changed, why it matters, what to do next, whether the queue is improving or worsening, what was tried for the top target, whether it is only quieting down or now counts as confirmed resolved, and whether recent confidence has actually been validating
- Open the workbook and review it in this order: `Dashboard`, `Run Changes`, `Review Queue`, `Portfolio Explorer`, `Repo Detail`, `Executive Summary`
- Clear anything in `Blocked` first
- Use the reported primary target as the single next thing to close before taking on newly ready work
- Review `Needs Attention Now` for drift and high-severity changes
- Work through `Ready for Manual Action`
- Leave `Safe to Defer` items alone unless priorities change
- Run `make workbook-gate` only when workbook-facing changes are in scope
- Run `make workbook-signoff ...` after the manual Excel-open check for workbook-facing changes
- Browse [http://127.0.0.1:8080/](http://127.0.0.1:8080/) after `audit serve` to review the dashboard

Scheduled automation stays artifact-first. The weekly workflow now runs the audit, generates a control-center artifact plus a scheduled handoff summary, uploads `output/`, opens or updates one canonical GitHub issue only when blocked or urgent operator findings cross a meaningful threshold, and closes that same issue cleanly when later runs return to a quiet state. The handoff now also calls out whether the queue is getting better, worse, or staying stuck, what was tried most recently, whether that intervention actually helped, whether recovery is only quiet for now or confirmed resolved, whether recent high-confidence guidance has been validating or turning noisy, what trust policy now applies to the live recommendation (`act-now`, `act-with-review`, `verify-first`, or `monitor`), whether a soft exception or recent policy-flip drift should make the operator treat that recommendation more cautiously, and whether recent soft caution is still earning trust or has become cautious enough to recover toward a stronger policy.

In newer follow-through phases, that same weekly story also carries whether a recommendation is escalating, recovering, rebuilding, re-acquiring confidence, or aging back down. The important product principle is still the same: workbook, HTML, Markdown, and review-pack surfaces should tell the same story in different formats.

## Troubleshooting

The fastest path for setup issues is:

```bash
audit run <github-username> --doctor
```

Common fixes:

- Missing GitHub token: set `GITHUB_TOKEN` or pass `--token` for private-repo access, GitHub writeback, metadata apply flows, and other authenticated actions.
- Missing or broken Notion config: create or fix `config/notion-config.json` before using `--notion-sync`, `--notion-registry`, or Notion writeback.
- Starting from scratch: copy `config/examples/audit-config.example.yaml` to `audit-config.yaml` and `config/examples/notion-config.example.json` to `config/notion-config.json`.
- Missing Excel template: restore `assets/excel/analyst-template.xlsx` or use `--excel-mode standard`.
- Missing baseline report: run a full audit before using `--repos`, `--incremental`, or other baseline-dependent workflows.
- Config/profile errors: fix `audit-config.yaml` syntax or choose an existing scoring profile under `config/scoring-profiles/`.

There is also a longer operator guide in [docs/operator-troubleshooting.md](docs/operator-troubleshooting.md).

## License

MIT
