Metadata-Version: 2.4
Name: picosentry
Version: 0.16.0
Summary: PicoSentry — deterministic supply-chain scanner for npm/pnpm, safe for ML pipelines
Author-email: Henrik Kirk <kirk@kirkforge.dev>
Maintainer: KirkForge
License-Expression: BUSL-1.1
Project-URL: Homepage, https://github.com/KirkForge/PicoSentry
Project-URL: Documentation, https://github.com/KirkForge/PicoSentry/blob/master/README.md
Project-URL: Repository, https://github.com/KirkForge/PicoSentry
Project-URL: Issues, https://github.com/KirkForge/PicoSentry/issues
Project-URL: Changelog, https://github.com/KirkForge/PicoSentry/blob/master/CHANGELOG.md
Project-URL: Release Notes, https://github.com/KirkForge/PicoSentry/releases
Project-URL: Source Code, https://github.com/KirkForge/PicoSentry
Project-URL: Bug Tracker, https://github.com/KirkForge/PicoSentry/issues
Keywords: supply-chain,security,npm,pnpm,scanner,deterministic,ml-safe,picosentry,sbom,cyclonedx,sarif,sca,software-composition-analysis,devsecops,ci-cd
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Build Tools
Classifier: Topic :: System :: Software Distribution
Classifier: Environment :: Console
Classifier: Framework :: Pytest
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: LICENSE-SUMMARY.md
License-File: COMMERCIAL-LICENSE.md
Requires-Dist: pyyaml>=6.0
Provides-Extra: pnpm
Requires-Dist: pyyaml>=6.0; extra == "pnpm"
Provides-Extra: yaml
Requires-Dist: pyyaml>=6.0; extra == "yaml"
Provides-Extra: sigstore
Requires-Dist: sigstore>=3.0; extra == "sigstore"
Requires-Dist: types-PyYAML; extra == "sigstore"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: pytest-timeout>=2.0; extra == "dev"
Requires-Dist: pyyaml>=6.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: ruff>=0.8; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Requires-Dist: sigstore>=3.0; extra == "dev"
Requires-Dist: types-PyYAML; extra == "dev"
Provides-Extra: all
Requires-Dist: pyyaml>=6.0; extra == "all"
Dynamic: license-file

# PicoSentry 🦞

**Deterministic, offline supply-chain scanner for npm/pnpm — safe for ML pipelines.**

[![CI](https://github.com/KirkForge/PicoSentry/actions/workflows/ci.yml/badge.svg)](https://github.com/KirkForge/PicoSentry/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/picosentry)](https://pypi.org/project/picosentry/)
[![Python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)](https://www.python.org/)
[![Tests](https://img.shields.io/badge/tests-1390%20passing-brightgreen)](https://github.com/KirkForge/PicoSentry)
[![Rules](https://img.shields.io/badge/rules-21-informational)](src/picosentry/docs/rules/)
[![Deterministic](https://img.shields.io/badge/deterministic-sha256%20verified-brightgreen)](SCAAT.md)
[![SLSA L3](https://img.shields.io/badge/SLSA-L3-blueviolet)](SLSA.md)
[![License: BUSL-1.1](https://img.shields.io/badge/license-BUSL--1.1-blue)](LICENSE)

<img src="docs/banner.png" alt="PicoSentry" width="100%">

Same inputs + same corpus version = same findings and scan fingerprint. Every time.

No HTTP at scan time. No probabilistic heuristics. No narrative in findings.

> **Note on determinism:** Default JSON output includes audit timestamps and timing data
> (useful for forensics). For byte-identical output across runs, use `--deterministic-output`
> or `--verify-determinism`, which omit timestamps and timing for reproducible CI artifacts.

## Quick Start

```bash
# Install
pip install picosentry

# Or from source
pip install -e .

# Or use Docker
docker build -t picosentry .
docker run --rm -v $(pwd):/scan picosentry scan /scan

# Scan a project
picosentry scan ./my-project

# CI-friendly health check (exit 1 on HIGH+CRITICAL only)
picosentry check ./my-project --fail-on high

# JSON output (deterministic, sorted keys)
picosentry scan ./my-project --format json

# Monorepo workspace scan
picosentry workspace . --format table

# CycloneDX SBOM
picosentry scan ./my-project --format cyclonedx

# Verify determinism — run twice, assert SHA-256 match (exit 0=match, 4=violation)
picosentry scan ./my-project --verify-determinism

# Manage custom IoC corpus packs
picosentry corpus export ./my-iocs.json
picosentry corpus import ./community-pack.json
picosentry corpus list

# Manage custom IoCs
picosentry ioc register ./suspicious-pkg.json
picosentry ioc list
```

## CLI Reference

```
picosentry scan <target> [OPTIONS]     Scan a project directory
picosentry check <target> [OPTIONS]    CI-optimized health check (exit-code only)
picosentry workspace <root> [OPTIONS]  Scan entire monorepo (discovers all npm/pnpm projects)
picosentry corpus export <output>      Export custom IoCs as a shareable pack
picosentry corpus import <path>        Import a corpus pack into your IoC registry
picosentry corpus validate <path>      Validate a corpus pack without importing
picosentry corpus list                 List available corpus packs
picosentry ioc register <path>         Register a custom IoC indicator
picosentry ioc list                    List user-registered custom IoCs
picosentry ioc remove <id>             Remove a custom IoC by ID
picosentry rules [--json]              List available detector rules
picosentry version                     Show version, corpus version, rule count
picosentry diff <a.json> <b.json>      Compare two scan files for determinism
picosentry init [target] [--force]     Generate .picosentry.yml template
picosentry update [--top N]            Download latest npm corpus (requires network)

Scan Options:
  --format, -f        json, sarif, table, ml-context, github, cyclonedx (default: table)
  --output, -o        Write output to file instead of stdout
  --rules, -r         Run only specific rules (e.g., L2-POST-001 L2-TYPO-001)
  --corpus, -c        Path to corpus directory (default: built-in)
  --no-color          Disable colored output (table format only)
  --token-budget      Token budget for ml-context format (default: 4096)
  --exit-code         Exit with code 1 if findings found
  --fail-on           Exit 1 only if findings at or above severity (implies --exit-code)
  --quiet, -q         Summary only, no detailed findings
  --summary           One-line summary for CI notifications
  --baseline, -b      Path to baseline JSON or ignore file
  --baseline-update    Write updated baseline after filtering
  --verbose, -v       Show per-rule timing and scan details on stderr
  --timeout           Scan timeout in seconds (0 = no timeout, exit code 3 on timeout)
  --log-format        Log output format: text (default) or json for SIEM integration
  --fail-on-rule-error  Exit code 4 if any detector rule raises an exception (fail-closed)
  --verify-determinism Run scan twice and verify SHA-256 determinism
  --deterministic-output  Omit timestamps and timing for byte-stable JSON output
  --sarif-file        Path for SARIF output file for --format github (default: sarif.json)
```

## Claw Pinch Branding

Human-facing table output uses lobster-themed severity labels:

| Standard Severity | PicoSentry Label |
|-------------------|------------------|
| CRITICAL / HIGH   | HARD PINCH 🦞    |
| MEDIUM            | SOFT PINCH       |
| LOW / INFO        | NUDGE            |

Clean scan: **"No pinches. All clear. 🦞"**

Machine formats (JSON, SARIF, CycloneDX, ml-context) use standard severity labels for CI/CD compatibility.

## Design Principles

1. **Deterministic by construction**: `sha256(scan_a) == sha256(scan_b)` on identical inputs + corpus version
2. **Offline at scan time**: No HTTP calls during scanning. Corpus is local and versioned.
3. **Pure functions**: Rules are `(target_path, corpus_dir) → List[Finding]`. No global state, no randomness.
4. **No narrative in findings**: Output is structured data. The consumer formats.
5. **ML-safe**: `--format ml-context` produces compact, token-budgeted output designed for LLM tool results.
6. **SBOM-first**: `--format cyclonedx` produces a full CycloneDX 1.5 SBOM with component inventory, purl, and hash verification.
7. **CI/CD native**: Sigstore-signed releases, GitHub Actions, pre-commit hooks, workspace scanning, `--fail-on-rule-error`.

## Configuration File

PicoSentry reads `.picosentry.yml` from the target directory (or `.picosentry.yaml` / `picosentry.config.yml`). Config file values are defaults; CLI flags override them.

```yaml
version: 1

# Output format: json, sarif, table, ml-context, github, cyclonedx
format: json

# Disable colored output
no_color: true

# Exit with code 1 if findings found
exit_code: true

# Only fail CI on HIGH or above
fail_on: high

# Suppress known findings from previous scan
baseline: baseline.json

# Severity overrides — downgrade/upgrade rule severity
severity_overrides:
  L2-PROV-001: INFO        # Downgrade provenance to info
  L2-FORK-001: LOW         # Downgrade fork drift to low

# Token budget for ml-context format
token_budget: 2048
```

## Workspace Scanning

Scan entire monorepos with one command. Supports pnpm workspaces, Nx, Turborepo, Lerna.

```bash
picosentry workspace . --format json
picosentry workspace . --fail-on high --quiet
```

## Custom IoC Registry

Register your own indicators of compromise. Never leaves your machine.

```bash
# Register a custom IoC
picosentry ioc register ./suspicious-package.json

# Export for sharing across teams
picosentry corpus export ./team-iocs.json --name "acme-corp-iocs"

# Import community packs
picosentry corpus import ./community-threats.json

# Validate a pack before importing
picosentry corpus validate ./pack.json
```

## Sigstore Verification

Every release is signed via Sigstore with OIDC identity from GitHub Actions.

```bash
# Verify a release artifact
./scripts/verify_release.sh v0.15.0

# Or manually
python -m sigstore verify identity \
  --cert-identity "https://github.com/KirkForge/PicoSentry/.github/workflows/release.yml@refs/tags/v0.15.0" \
  --cert-oidc-issuer "https://token.actions.githubusercontent.com" \
  picosentry-0.15.0-py3-none-any.whl
```

## Pre-commit Hooks

Drop into any npm/pnpm project's `.pre-commit-config.yaml`:

```yaml
repos:
  - repo: https://github.com/KirkForge/PicoSentry
    rev: v0.15.0
    hooks:
      - id: picosentry-scan       # Full 19-rule scan
      - id: picosentry-check      # Fast CI check (HIGH+CRITICAL only)
      - id: picosentry-workspace  # Monorepo scan
```

## Supply Chain Attack Coverage (21 Rules)

| Rule ID       | Attack Vector              | Severity |
|---------------|----------------------------|----------|
| L2-POST-001   | Post-install scripts       | HIGH     |
| L2-OBFS-001   | eval / Function obfuscation| HIGH     |
| L2-OBFS-002   | Hex-encoded payloads       | MEDIUM   |
| L2-OBFS-003   | base64+eval obfuscation    | MEDIUM   |
| L2-OBFS-004   | Unicode escape obfuscation | LOW      |
| L2-DEPC-001   | Dependency confusion       | HIGH     |
| L2-TYPO-001   | Typosquatting              | HIGH     |
| L2-MANI-001   | Manifest tampering         | HIGH     |
| L2-MANI-002   | Optional deps with scripts | MEDIUM   |
| L2-FORK-001   | Fork drift                 | HIGH     |
| L2-CRED-001   | Credential / secret leak   | HIGH     |
| L2-LOCK-001   | Lockfile drift             | MEDIUM   |
| L2-BUND-001   | Bundled shadow dependencies| HIGH     |
| L2-PROV-001   | Provenance / integrity     | MEDIUM   |
| L2-MAINT-001  | Maintainer change / takeover| HIGH    |
| L2-PNPM-001   | pnpm dangerous config      | HIGH     |
| L2-LICENSE-001| License issues             | MEDIUM   |
| L2-ENGIN-001  | Engine issues              | LOW      |
| L2-SIDELOAD-001| Protocol sideloading       | MEDIUM   |
| L2-IOC-001   | Custom IoC detection        | INFO–CRITICAL |
| L2-ADV-001   | Advisory vulnerability (OSV/GHSA/npm) | MEDIUM–CRITICAL |

See [SCAAT.md](SCAAT.md) for the full attack-vector-to-rule mapping with confidence levels.

## Deterministic Guard Stack

PicoSentry enforces determinism at four layers:

```
┌─────────────────────────────────────────┐
│  Layer 4: CI Gate                       │
│  --verify-determinism (CLI)             │
│  Runs scan twice, asserts SHA-256 match │
├─────────────────────────────────────────┤
│  Layer 3: Diff                          │
│  picosentry diff a.json b.json          │
│  Compare two saved scans field-by-field │
├─────────────────────────────────────────┤
│  Layer 2: Guard (runtime)               │
│  Validates invariants after each scan   │
├─────────────────────────────────────────┤
│  Layer 1: Models (structural)           │
│  Frozen dataclasses, sorted keys        │
└─────────────────────────────────────────┘
```

## Architecture

```
picosentry/
├── __init__.py          # Public API + __version__
├── __main__.py          # python -m picosentry
├── cli.py               # CLI entry point (scan, check, workspace, corpus, rules, diff, init, update)
├── engine.py            # ScanEngine orchestrator (per-rule timing, corpus resolution)
├── guards.py            # Deterministic guard stack (enforcement + verification)
├── models.py            # Finding, ScanResult, ScanStats, BaselineResult (frozen dataclasses)
├── config.py            # .picosentry.yml loader + merge
├── logging.py           # Structured JSON logging for SIEM
├── workspace.py         # Multi-project/monorepo workspace scanning
├── ioc_registry.py      # Custom IoC registration + management
├── corpus_share.py      # Corpus pack export/import/validate (marketplace)
├── corpus/
│   ├── npm_top_packages.json  # 327 top npm packages (typosquat targets)
│   └── ioc/                    # IoC metadata (event-stream, Shai-Hulud, left-pad, etc.)
├── rules/
│   ├── post_install.py   # L2-POST-001
│   ├── obfuscation.py    # L2-OBFS-001..004
│   ├── dep_confusion.py  # L2-DEPC-001
│   ├── typosquat.py      # L2-TYPO-001
│   ├── manifest.py       # L2-MANI-001/002
│   ├── fork_drift.py     # L2-FORK-001
│   ├── credential_read.py  # L2-CRED-001
│   ├── pnpm_lock_parser.py # pnpm-lock.yaml v6+ parser
│   ├── lockfile_drift.py   # L2-LOCK-001
│   ├── bundled_shadow.py   # L2-BUND-001
│   ├── provenance.py       # L2-PROV-001
│   ├── maintainer_change.py # L2-MAINT-001
│   ├── pnpm_config.py      # L2-PNPM-001
│   ├── license.py           # L2-LICENSE-001
│   ├── engine.py            # L2-ENGIN-001
│   └── sideloading.py       # L2-SIDELOAD-001
├── formatters/
│   ├── json_fmt.py       # Deterministic JSON (sorted keys)
│   ├── sarif.py          # SARIF 2.1.0
│   ├── table.py          # Human-readable with claw pinch branding
│   ├── ml_context.py     # Token-budgeted for LLM tool results
│   ├── github.py         # SARIF file + markdown summary for GitHub Actions
│   └── cyclonedx.py      # CycloneDX 1.5 SBOM
└── tests/
    ├── test_scanner.py          # Core scanner + determinism
    ├── test_guards.py           # Deterministic guard stack
    ├── test_cli.py              # CLI integration
    ├── test_config.py           # Config file parsing
    ├── test_config_integration.py # Config + scan integration
    ├── test_docs.py             # Rule documentation completeness
    ├── test_init_and_sarif.py    # Init command + SARIF format
    ├── test_license.py           # License compliance
    ├── test_pnpm_lock_parser.py  # pnpm lockfile parser
    ├── test_sideloading.py       # Protocol sideloading
    ├── test_benchmark.py         # Performance benchmarks
    └── fixtures/                 # Test projects (IoC regression suite)
```

## License & Attestations

- **License**: Business Source License 1.1 (BUSL-1.1) ([LICENSE](LICENSE)); commercial use requiring a license — see [COMMERCIAL-LICENSE.md](COMMERCIAL-LICENSE.md)
- **SCAAT**: [SCAAT.md](SCAAT.md) — Supply Chain Attacks and Threats mapping
- **SLSA**: [SLSA.md](SLSA.md) — SLSA Build L3 roadmap
- **Security**: [SECURITY.md](SECURITY.md) — vulnerability reporting
- **Citation**: [CITATION.cff](CITATION.cff) — academic citation
