Metadata-Version: 2.4
Name: autosecscan
Version: 0.1.0
Summary: LLM-assisted security scanner: network + code vulnerability scanning with AI triage, fix suggestions, and structured reports.
Author: AutoSecScan contributors
License: MIT
Project-URL: Homepage, https://github.com/jhammant/AutoSecScan
Project-URL: Repository, https://github.com/jhammant/AutoSecScan
Project-URL: Changelog, https://github.com/jhammant/AutoSecScan/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/jhammant/AutoSecScan/issues
Keywords: security,vulnerability,scanner,sast,dast,llm,nmap,nuclei,semgrep,pentest
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer>=0.12
Requires-Dist: rich>=13.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: Jinja2>=3.1
Requires-Dist: httpx>=0.27
Provides-Extra: pdf
Requires-Dist: weasyprint>=61.0; extra == "pdf"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Dynamic: license-file

# 🛡️ AutoSecScan

**LLM-assisted security scanning & reporting** — free and open source.

AutoSecScan wraps best-in-class open-source scanners (nmap, nuclei, semgrep,
trivy, gitleaks, osv-scanner) and uses an LLM to **triage the results**: it flags
false positives, re-ranks severity by real-world impact, explains each finding in
plain English, and suggests a concrete fix. It runs from the command line,
produces **JSON / HTML / PDF** reports, and can run **daily via cron** or **after
a git commit/push**. Think of it as a self-hosted, hackable take on Aikido —
covering both your **infrastructure (an IP/host)** and your **code (a repo)**.

The LLM can be **fully local** (Ollama or **LM Studio**), routed through **Claude
Code Router**, called against the **Anthropic API**, spun up on demand via
**aiondemand**, or any **OpenAI-compatible** endpoint — selectable by **size tier**
(small / medium / big / massive) and **flavor** (standard / *uncensored*).

It runs in two modes: a deterministic **fixed pipeline**, or an **agentic mode**
(`--agent`) where the LLM *manages the scan itself* — looking at what's been found
and deciding the next action until it's satisfied.

---

## ⚠️ Authorization — read this first

> **Active scanning of systems you do not own or lack written permission to test
> is illegal** (US CFAA, UK Computer Misuse Act, and equivalents worldwide).

AutoSecScan is built to make misuse hard:

- **Hard allowlist.** Nothing outside `authorization.authorized_hosts` /
  `authorized_repos` is *ever* scanned. Unlisted targets are refused before a
  single packet is sent.
- **Per-run attestation.** Every run requires you to certify authorization —
  interactively, or with `--i-have-permission` for automation.
- **No "scan anything" switch.** There is intentionally no global override.

You are solely responsible for how you use this tool. Only scan what you own or
have explicit, written permission to test.

---

## Features

| | |
|---|---|
| **Network / host scanning** | `nmap` (ports/services/versions), `nuclei` (CVE & misconfig templates), `nikto` (web server) |
| **Code / dependency scanning** | `semgrep` (SAST), `trivy` (deps + IaC + secrets), `gitleaks` (committed secrets), `osv-scanner` (known-vuln deps) |
| **AI triage** | false-positive detection, severity re-ranking, plain-English explanations, concrete fix suggestions, executive summary |
| **Pluggable LLM** | Ollama · Claude Code Router · Anthropic · aiondemand · any OpenAI-compatible endpoint |
| **Model choice** | size tiers `small\|medium\|big\|massive` × flavors `standard\|uncensored` (abliterated/heretic) |
| **Reports** | structured **JSON**, self-contained **HTML**, and **PDF** |
| **Automation** | daily **cron**, **git hooks** (post-commit / pre-push), CI-friendly `--fail-on` exit codes |
| **Graceful degradation** | missing scanners are skipped with install hints; missing LLM still produces a report |

## Install

### Docker (recommended — every scanner + PDF baked in)

```bash
docker build -t autosecscan .
docker run --rm autosecscan doctor
# scan a repo, reports land on the host:
docker run --rm -v "$PWD:/work" -v "$PWD/reports:/reports" \
  autosecscan scan --repo /work --no-net --i-have-permission --format sarif,pdf --out /reports
```

The image bundles nmap, nuclei, semgrep, trivy, gitleaks, osv-scanner and the
PDF libraries — nothing to install on the host. A published image will be at
`ghcr.io/<org>/autosecscan` (see `.github/workflows/release.yml`).

### From source

```bash
git clone <your-repo-url> AutoSecScan && cd AutoSecScan
python3 -m venv .venv && . .venv/bin/activate
pip install -e .          # add '.[pdf]' for PDF output (needs pango/cairo)
```

Install whichever scanners you want (all optional — run `autosecscan doctor` to
see what's detected):

```bash
# macOS (Homebrew)
brew install nmap nuclei nikto trivy gitleaks osv-scanner pango
pip install semgrep
```

## Quick start

```bash
autosecscan init                       # write a starter config you can edit
autosecscan authorize add-host 127.0.0.1
autosecscan authorize add-repo ~/dev/myapp

autosecscan doctor                     # what scanners + which LLM are available?

# Scan a host and a repo, AI-triaged, full reports:
autosecscan scan \
  --target 127.0.0.1 \
  --repo ~/dev/myapp \
  --tier medium --flavor standard \
  --format json,html,pdf --out ./reports \
  --i-have-permission
```

Reports land in `./reports/scan-<timestamp>.{json,html,pdf}`.

## Agentic mode — the LLM manages the scan

By default AutoSecScan runs a fixed pipeline. With `--agent`, the LLM instead
drives the scan step by step: it starts with discovery, looks at the findings,
and chooses the next scanner/target — e.g. *nmap finds a web service → the LLM
runs nuclei against it → sees the result → digs further* — stopping when it
decides the assessment is complete.

```bash
autosecscan scan --target 127.0.0.1 --agent --tier big --flavor uncensored \
  --i-have-permission --max-steps 12
```

You'll see the LLM's reasoning stream as it works (`🧠 step N: ...`).

**This does not weaken the safety model.** The agent can only:
- run **known, installed** scanners (never arbitrary commands),
- against targets that **re-pass the authorization allowlist** every step
  (a target the LLM invents but that isn't authorized is refused, not scanned),
- with **sanitized** arguments passed as argv (no shell), bounded by `--max-steps`
  and a per-step timeout.

A capable model helps here — this is a good use for a big uncensored/abliterated
model (safety-tuned models tend to refuse offensive-security reasoning).

### Can the LLM add tools itself?

Yes, within bounds. With `--allow-install`, if the agent decides it needs a
scanner that isn't installed (say it finds a web app and wants `nuclei`), it can
install it mid-scan and then use it:

```bash
autosecscan scan --target 127.0.0.1 --agent --allow-install --i-have-permission
#   🧠 step 2: I found a web service; I need nuclei to check it for CVEs.
#   ⬇  step 2: installing nuclei  (brew install nuclei)
#   ▶  step 3: nuclei → http://127.0.0.1:8080
```

The safety boundary is deliberate: the LLM can only install from a **curated
recipe map** (`nuclei`, `semgrep`, `trivy`, `gitleaks`, `osv` → fixed
pip/brew/go commands) — it **cannot invent an install command or run an arbitrary
one**. Off by default.

**Level 2 — run allowlisted tools that have no dedicated parser** (`--allow-tools`).
The agent may invoke any tool you allowlist (`sslscan`, `testssl`, `httpx`,
`whatweb`, `wpscan`, `sqlmap`, `nikto`); its stdout is captured as raw evidence
that the LLM interprets in later steps and during triage:

```bash
autosecscan scan --target 127.0.0.1 --agent \
  --allow-tools sslscan,httpx,nikto --i-have-permission
```

Still bounded: only tools **you** allowlist, only against **authorized** targets,
argv-only with sanitized args, and a **fixed invocation template per tool** (never
a free-form command line).

**Level 3 — the LLM writes its OWN probe** (`--allow-codegen`, ⚠ dangerous). The
agent can author a self-contained Python probe and run it — useful when no
existing tool checks the exact thing it wants to verify:

```bash
autosecscan scan --target hammant.io --agent --allow-codegen --i-have-permission
#   🛠  step 3: LLM wrote tool 'header_check'; running sandboxed (docker)
#      saved script: reports/generated_tools/header_check.py
```

Because this executes model-written code, the guardrails are strict:
- **Off by default.** Requires `--allow-codegen`.
- **Sandboxed in Docker** (default): the script is mounted read-only, the
  container has **no access to your filesystem**, and CPU/memory/pids are capped
  — the host cannot be harmed. (`--codegen-sandbox subprocess` runs on the host
  UNSANDBOXED; only use it in a throwaway VM.)
- **Python-only**, standard-library-oriented; a **denylist pre-screen** refuses
  obviously destructive/exfil code; **every generated script is saved** to
  `reports/generated_tools/` for you to review; targets are re-authorized.

Adding a brand-new *parsed* scanner is a small code change (subclass `Scanner`,
parse the tool's JSON, add an install recipe) — see `autosecscan/scanners/`.

## Choosing the LLM

Pick per-run without touching config:

```bash
autosecscan models                     # list the whole provider × flavor × tier catalog
autosecscan scan ... --provider ollama   --tier big
autosecscan scan ... --provider lmstudio --tier big --flavor uncensored   # LM Studio, OpenAI API
autosecscan scan ... --provider lmstudio-native --tier big --flavor uncensored  # LM Studio native /api/v1/chat (reasoning models)
autosecscan scan ... --provider ccr      --tier medium     # Claude Code Router
autosecscan scan ... --provider anthropic --tier big
autosecscan scan ... --provider ondemand --tier massive --flavor uncensored  # remote GLM abliterated
autosecscan scan ... --model qwen2.5-coder:32b             # exact model override
```

### Recommended models

Triage and (especially) agentic mode reason better with a capable model. Rough guidance:

| Use | Minimum | Recommended |
|-----|---------|-------------|
| **Triage** (explain / fix / FP-filter) | 7B (`--tier medium`, the default) | 14B–32B (`--tier big`) |
| **Agentic pen-testing** (`--agent`) | 14B | 32B+ / a strong reasoning model (`--tier big/massive`) |

Avoid ≤3B models for `--agent` — they misuse the action protocol and fumble JSON
(AutoSecScan warns and recovers, but quality suffers). The default (`--tier medium`,
a 7B coder model) is fine for triage. For a nightly pentest, pull a bigger model
and run `--tier big` (or point a provider at a hosted/abliterated 32B+).

**LM Studio** is auto-detected on `http://127.0.0.1:1234`, via two providers:
`lmstudio` (OpenAI-compatible `/v1/chat/completions`) and `lmstudio-native`
(LM Studio's native `/api/v1/chat`, which supports **reasoning models** and the
`{system_prompt, input} → {output:[…]}` shape). The default config maps their
tiers to abliterated/heretic builds found on your machine — run
`autosecscan models` to see them, or set the `LMSTUDIO_*` env vars.

**Uncensored / abliterated models** (`--flavor uncensored`) matter here: safety-
tuned models frequently refuse to reason about exploits, payloads, or specific
vulnerability detail — exactly the analysis a security tool needs. Pull one for
Ollama, e.g. `ollama pull huihui_ai/qwen2.5-abliterate:7b`, or point the tier at
whatever you have.

**aiondemand (remote spin-up, e.g. a big GLM abliterated).** Set the provider's
lifecycle commands so a remote GPU endpoint is brought up only for the triage
window (or the agentic run) and torn down afterward. The `uncensored/massive`
tier is pre-wired to a GLM abliterated model — point `AIONDEMAND_GLM` at the
exact id your deployment serves:

```yaml
llm:
  provider: ondemand
  tier: massive
  flavor: uncensored          # -> ${AIONDEMAND_GLM:-huihui-ai/GLM-4.6-abliterated}
  providers:
    ondemand:
      base_url: ${AIONDEMAND_BASE_URL}/v1
      api_key: ${AIONDEMAND_API_KEY}
      spin_up_cmd: "aiondemand up --model {model} --wait"   # {model} is substituted
      spin_down_cmd: "aiondemand down"
      health_path: /models
      health_retries: 30
      health_interval: 5
```

The `uncensored/massive` tier defaults to
[`huihui-ai/Huihui-GLM-5.2-abliterated-GGUF`](https://huggingface.co/huihui-ai/Huihui-GLM-5.2-abliterated-GGUF).

```bash
export AIONDEMAND_BASE_URL=https://my-endpoint.example.com
export AIONDEMAND_GLM="huihui-ai/Huihui-GLM-5.2-abliterated-GGUF"   # or your served alias
autosecscan scan --target 10.0.0.5 --agent --provider ondemand --tier massive --flavor uncensored --i-have-permission
```

## Automation

**Daily cron** (default 02:00) — built for **continuous pen-testing**:

```bash
autosecscan install-cron --target hammant.io --agent \
  --schedule "0 2 * * *" \
  --notify "https://hooks.slack.com/services/XXX"      # or a Discord webhook / shell command
autosecscan list-cron
autosecscan remove-cron
```

Between runs AutoSecScan remembers what it already found (per target set, in
`~/.autosecscan/state/`), so a daily cron **only alerts on what's NEW** rather
than re-paging you about the same issues. The report and CLI summary both mark
new findings, and `--notify` fires only when there's a new finding at/above
`--notify-min` (default `high`). Notifications never break a scan if they fail.

**Git hook** — scan code on every push (blocking if High+ found):

```bash
autosecscan install-hook ~/dev/myapp --hook pre-push --blocking --fail-on high
autosecscan remove-hook ~/dev/myapp --hook pre-push
```

**CI gate** — non-zero exit when a real finding meets a threshold:

```bash
autosecscan scan --repo . --no-net --i-have-permission --fail-on high
```

## Privacy & data egress

Triage sends findings to an LLM — and findings can contain secrets (from
gitleaks/trivy) or source code. AutoSecScan handles this by provider type:

- **Local models** (Ollama, LM Studio): nothing leaves your machine.
- **Hosted models** (Anthropic, remote aiondemand, OpenAI): secrets are
  **redacted** before findings are sent (`llm.redact_secrets`, default on).
- **`llm.local_only: true`**: refuse hosted providers entirely — if the selected
  model is hosted, AI triage is skipped so no scan data leaves the box.

Reports contain sensitive findings and are written `0600`. See
[SECURITY.md](./SECURITY.md) for the full responsible-use and data-handling policy.

## Configuration

Config is layered (later wins): built-in defaults → `~/.config/autosecscan/config.yaml`
→ `./config.yaml` → `--config PATH`. Strings support `${ENV}` and `${ENV:-default}`
expansion so secrets/endpoints stay out of the file. See
[`config.example.yaml`](./config.example.yaml) for every option.

## How it works

```
targets ──▶ authorization gate ──▶ scanners (network + code) ──▶ normalized findings
                                                                      │
                                              LLM triage ◀────────────┘
                                       (FP detection, re-rank, explain, fix, summary)
                                                                      │
                                                    JSON · HTML · PDF report
```

Scanners run first (no LLM needed); the LLM is brought up only for triage — so a
remote/on-demand endpoint is billed for minutes, not the whole scan.

## Development

```bash
pip install -e '.[dev]'
pytest
```

## License

MIT. See [LICENSE](./LICENSE). AI-assisted findings should be verified before you
act on them; this tool assists judgment, it does not replace it.
