Metadata-Version: 2.4
Name: aollivierre-sca
Version: 0.2.7
Summary: Source-control automation -- audit, secret-scan, and remediate Git workspaces
Author: aollivierre
License: MIT
Project-URL: Homepage, https://github.com/aollivierre/source-control-automation
Project-URL: Repository, https://github.com/aollivierre/source-control-automation
Project-URL: Issues, https://github.com/aollivierre/source-control-automation/issues
Keywords: git,source-control,audit,secret-scanning,remediation,workspace
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Version Control :: Git
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.12
Description-Content-Type: text/markdown

# source-control-automation v3 (Python rewrite)

[![v3 tests](https://github.com/aollivierre/source-control-automation/actions/workflows/v3-tests.yml/badge.svg)](https://github.com/aollivierre/source-control-automation/actions/workflows/v3-tests.yml)
![coverage 58%](https://img.shields.io/badge/coverage-58%25-yellow)
[![release](https://img.shields.io/github/v/release/aollivierre/source-control-automation)](https://github.com/aollivierre/source-control-automation/releases)
![safe to run](https://img.shields.io/badge/safe%20to%20run-read--only%20by%20default-3fb950)

> **[LOCKED] Safe to run.** Bare `sca` is **read-only**. It walks your code root, writes JSON + an HTML report to its own output dir, and opens the HTML in your browser -- that's it. The pipeline produces a *plan* of what would-need-fixing but never executes it. After the report opens, you'll get a `y/N` prompt to apply the plan; type `n` (or just Enter) to skip and review first. Destructive operations always require an explicit `--apply` flag.

Cross-platform Python rewrite of v1's PowerShell framework, with a smaller surface and a few capabilities v1 was missing.

## Quick start

```bash
cd ~/code        # or wherever your repos live
sca              # produces a report, opens it, asks before fixing anything
```

That's the whole onboarding. The same `sca` command works on Linux, macOS, and Windows. On Windows specifically, the auto-open uses `explorer.exe <path>` so the browser launches in your *interactive user session* even when `sca` was started from an elevated PowerShell.

## Why v3 exists

v1 (the rest of this repo) is a thorough PowerShell framework -- five numbered orchestration scripts, a Pester suite, ~25 specialized fix scripts at the root, gitignore template library. Real value, real coverage. But:

- **Windows / PowerShell only** -- won't run on a Linux dev box or a CI runner that defaults to bash.
- **Surface area** -- 128 files for what's logically a 5-stage workflow. The `Fix-RemainingIssues.ps1`-style scripts are one-off remediations that crept into the tree.
- **Doesn't notice "wrapper repo wrapping nested repos"** -- the pattern where a parent `.git` tracks paths that have their own `.git` directories. We hit this on the original `C:\code\.git` and had to extract via `git subtree split` by hand.
- **No secret scanning during audit** -- v1 looks for "sensitive file extensions" (`.pfx`, `.env`) but not for *content* patterns like `Obfuscated*` config + decode function in same repo, leaked PATs in committed scripts, base64+XOR secrets, etc. We found three leaked PATs and a leaked Azure cert by hand during one audit pass -- those should be detected by the tool.
- **HEAD-only branch view** -- only inspects the current branch; can't tell you that you have 8 unpushed feature branches or that `main` is 63 commits behind your active branch.
- **The tool has its own leaked PAT** -- `Fix-RemoteUrls.ps1` line 3 hardcodes a `github_pat_*`. A tool that *standardizes source control* should not be the place this happens.

v3 is the smaller, opinionated rewrite that keeps v1's good ideas and adds the missing pieces.

## Architecture

```
v3/
+-- README.md              (this file)
+-- pyproject.toml         (sca package metadata)
+-- sca/
|   +-- __init__.py
|   +-- cli.py             (entry: `sca audit`, `sca scan`, `sca branches`, `sca render`)
|   +-- audit.py           (walk tree -> classify repos/orphans/loose files -> JSON)
|   +-- branches.py        (per-repo branch audit: unpushed, main-behind, diverged)
|   +-- secrets.py         (NEW: pattern-scan for leaked PATs, XOR-obfuscated configs, etc.)
|   +-- render.py          (JSON -> single-file HTML report)
|   +-- classify.py        (port v1's 5-state model; repo strategy decision)
|   +-- remediate.py       (port v1's state-based fixes; backup-before-modify)
|   +-- extract.py         (NEW: detect + fix wrapper-repo-wrapping-nested-repos pattern)
+-- templates/
|   +-- gitignore/         (port v1's library: dotnet, python, node, powershell, etc.)
+-- tests/
    +-- ...                (pytest port of v1's Pester suite)
```

## What carries over from v1

| v1 idea | v3 home |
|---|---|
| 5-state classification (NoSC / LocalGitOnly / Incomplete / PartialSync / Compliant) | `sca/classify.py` |
| Dedicated-vs-consolidated repo strategy decision (file count, .sln/.csproj presence) | `sca/classify.py` |
| `.gitignore` template library by project type | `templates/gitignore/` |
| Backup-before-modify (zip the project before destructive ops) | `sca/remediate.py` |
| Per-state remediation workflows (init / push / commit / sync) | `sca/remediate.py` |

## What carries over from v2 (the audit scripts written 2026-04-30)

| v2 script | v3 home |
|---|---|
| `audit-code.py` (walk + classify into JSON) | `sca/audit.py` |
| `branch-audit.py` (per-branch flags) | `sca/branches.py` |
| `render-audit.py` (JSON -> HTML) | `sca/render.py` |

Plus: cross-platform support (`--root` / `CODE_ROOT` env var) so the same code runs on Linux and Windows.

## What's NEW in v3 (not in v1 or v2)

1. **Secret scanning during audit** -- `sca/secrets.py` scans every committed file for:
   - Hardcoded `github_pat_...`, `ghp_...`, `sk-...`, AWS access keys
   - `Obfuscated*` field names paired with a `Get-DecryptedValue` (or similar) decode function in same repo (XOR/base64 antipattern)
   - Embedded PFX/PEM blobs
   - SharePoint URLs, tenant/client/list UUIDs that look like real values
   - Any `password\s*[:=]` patterns inside config files
2. **Wrapper-repo detection + extraction** -- `sca/extract.py` detects the "parent `.git` wraps child repos that have their own `.git`" pattern and offers a clean extraction (the `git subtree split` workflow we did manually).
3. **Visibility-before-push gate** -- every `git push` is preceded by a quick check: is the remote repo public? If yes, run `secrets.py` before the push. Stops the 68-minute-public-exposure problem we hit with ResetIntuneEnrollment.
4. **Branch-level audit** -- already had this in v2; v1 was HEAD-only. Surfaces unpushed branches and `main`-behind situations as first-class report items.

## Out of scope for v3

- Continuous monitoring / dashboard -- v1's Mode 4. The audit is one-shot. If you want recurring runs, schedule via cron / Task Scheduler.
- Cross-platform support beyond Windows + Linux. macOS should work but isn't tested.
- Non-GitHub forges (GitLab, Bitbucket, Gitea). v1 had aspirations here; v3 is GitHub-only by design.

## Status / progress (this branch)

| Module | Status | Notes |
|---|---|---|
| `sca/audit.py` | [DONE] ported from v2 | walk + classify dirs -> JSON; cross-platform (`--root`) |
| `sca/branches.py` | [DONE] ported from v2 | per-repo branch state; unpushed / behind / diverged |
| `sca/render.py` | [DONE] ported from v2 | JSON -> single-file HTML report |
| `sca/secrets.py` | [DONE] NEW | live token regex, XOR-obfuscation pair detector, real-looking GUIDs, SharePoint URLs |
| `sca/classify.py` | [DONE] NEW | 5-state model + 3 extra states v1 didn't have (Empty, LooseFile, WrapperRepo) + dedicated/consolidated decision |
| `sca/extract.py` | [DONE] NEW | wrapper-repo detection + repair (subtree-split / archive-and-delete) |
| `sca/templates.py` + `templates/gitignore/` | [DONE] NEW | stack detection (python/node/dotnet/powershell) -> curated gitignore |
| `sca/remediate.py` | [DONE] NEW | per-state plan + executor; backup-before-modify; visibility-before-push gate |
| `sca/cli.py` | [DONE] NEW | `sca audit \| branches \| scan \| classify \| extract \| gitignore \| remediate \| render` |
| `tests/` | [DONE] NEW | 35 pytest tests, all green |
| `pyproject.toml` | [DONE] | `pip install -e v3` |

Smoke runs on `C:\code` (real workspace):
- audit + classify: 24 entries -> 20 FullyCompliant, 2 WrapperRepo, 1 IncompleteSourceControl, 1 LooseFile
- secrets scan on this very repo: caught **16 hardcoded GitHub PATs** in v1's root remediation scripts (subsequently rotated and redacted)
- extract: identified `UnicodeReplacementTool/` as the one wrapper-repo situation in the workspace

## Open work

- The visibility-before-push gate in `remediate.py` calls `gh api` to determine repo visibility -- that's a network dependency. A `--offline` mode that errs on the side of "treat as public" would be safer for CI use.
- `sca extract --archive-and-delete` works but the subtree-split path needs an integration test against a real wrapper repo (currently only unit-tested via `extract.detect`).
- `sca render` still writes to `C:/code/temp/audit-report.html` by default -- that path needs to be parameterized or written next to the input JSON.

## How to use it

```bash
pip install -e v3                       # editable install puts `sca` on PATH

sca audit --root ~/code                 # walks tree, prints JSON
sca audit --root ~/code | sca classify --summary
sca scan ~/code/some-repo               # secret scan one repo
sca extract --root ~/code               # find wrapper-repo situations
sca gitignore ~/code/some-dir --write   # write a stack-aware .gitignore
sca audit --root ~/code | sca remediate --plan       # dry-run plan
sca audit --root ~/code | sca remediate --apply      # actually run it
```

Set `CODE_ROOT` to skip `--root` everywhere.
