Metadata-Version: 2.4
Name: sift-triage
Version: 1.0.0
Summary: AI-powered alert triage summarizer for SOC teams
Author-email: Christian Huhn <duathron@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/duathron/sift
Project-URL: Repository, https://github.com/duathron/sift
Project-URL: Bug Tracker, https://github.com/duathron/sift/issues
Keywords: siem,alert-triage,soc,dfir,mitre-attack,ai,cli
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Information Technology
Classifier: Topic :: Security
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: typer>=0.12.0
Requires-Dist: rich>=13.7.0
Requires-Dist: pydantic>=2.7.0
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: python-dotenv>=1.0.0
Provides-Extra: llm
Requires-Dist: anthropic>=0.30.0; extra == "llm"
Requires-Dist: openai>=1.30.0; extra == "llm"
Provides-Extra: enrich
Requires-Dist: barb-phish>=1.0.0; extra == "enrich"
Provides-Extra: all
Requires-Dist: anthropic>=0.30.0; extra == "all"
Requires-Dist: openai>=1.30.0; extra == "all"
Requires-Dist: barb-phish>=1.0.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: mypy>=1.10.0; extra == "dev"

# sift

```
  ____ ___ _____ _____
 / ___|_ _|  ___|_   _|
 \___ \| || |_    | |
  ___) | ||  _|   | |
 |____/___|_|     |_|
```

**AI-Powered Alert Triage Summarizer for SOC Teams**

`sift` ingests raw security alerts, deduplicates and clusters related events, scores them by priority, and delivers a structured triage summary — with optional AI-generated analysis. Part of the barb → vex → sift SOC workflow trilogy.

---

## Features

- Ingest alerts from generic JSON, Splunk exports, or CSV
- Deduplicate noisy alert streams before analysis
- Extract IOCs (IPs, domains, hashes, URLs) from alert fields automatically
- Cluster related alerts by IOC overlap, category + time window, or IP-pair correlation
- Score clusters across five priority tiers: NOISE / LOW / MEDIUM / HIGH / CRITICAL
- AI summarization via Anthropic Claude, OpenAI, Ollama (local), or template-based with no LLM required
- Rich terminal output with priority-colored cluster table
- Export to JSON, CSV, or STIX 2.1 for downstream tooling
- Filter clusters using a boolean DSL (`--filter 'priority >= HIGH AND ...'`)
- Enrich IOCs via barb (phishing URL analysis) and vex (VirusTotal reputation) with `--enrich`
- Cache triage results by input fingerprint with `--cache` (opt-in, 1h TTL)
- Validate LLM output schema and detect prompt injection attacks
- `sift metrics <file>` command for cluster and IOC distribution statistics
- `sift doctor` diagnostics to verify configuration, LLM connectivity, and dependencies
- PyPI version check on startup

---

## Installation

```bash
pip install sift-triage
```

**Optional extras:**

```bash
# LLM summarization (Anthropic + OpenAI)
pip install "sift-triage[llm]"

# IOC enrichment via barb/vex
pip install "sift-triage[enrich]"

# Everything
pip install "sift-triage[llm,enrich]"
```

### Kali Linux / Debian

```bash
# Recommended: use pipx for isolated CLI tool installation
sudo apt install pipx   # or: pip install pipx
pipx install sift-triage

# With LLM support
pipx install "sift-triage[llm]"

# With barb + vex enrichment
pipx install "sift-triage[enrich]"
```

> **Note:** Python 3.11+ required. Kali Linux 2024+ includes Python 3.12 by default.
> On older systems: `sudo apt install python3.12 python3.12-venv`

---

## Quick Start

**Triage a JSON alert file:**
```bash
sift triage alerts.json
```

**Triage with AI summarization (Anthropic Claude):**
```bash
sift triage alerts.json --summarize --provider anthropic
```

**Pipe from Splunk or another tool:**
```bash
cat splunk_export.json | sift triage -
```

**Export triage report to JSON:**
```bash
sift triage alerts.json -f json -o report.json
```

**Export triage report as STIX 2.1 bundle:**
```bash
sift triage alerts.json -f stix -o bundle.json
```

**Filter to HIGH and CRITICAL clusters only:**
```bash
sift triage alerts.json --filter 'priority >= HIGH'
```

**Enable result caching (skip reprocessing on repeated runs):**
```bash
sift triage alerts.json --cache
```

**Show metrics for an alert file:**
```bash
sift metrics alerts.json
```

**Run diagnostics:**
```bash
sift doctor
```

**Enrich IOCs via barb (phishing URLs) + vex (VirusTotal):**
```bash
sift triage alerts.json --enrich --summarize
```

**Enrich only via barb (no VirusTotal API key needed):**
```bash
sift triage alerts.json --enrich --enrich-mode barb
```

---

## Workflow

`sift` is the third stage of a SOC analyst trilogy. Use `barb` to score and flag suspicious URLs in incoming data, pass flagged IOCs to `vex` for VirusTotal enrichment, then feed the enriched alert data into `sift` for cluster-level triage and summarization. Each tool is useful standalone; together they cover URL analysis → IOC reputation → alert prioritization in a single scriptable pipeline. The `--enrich` flag automates barb and vex calls directly from within `sift triage`.

---

## Input Formats

| Format | Description | Notes |
|---|---|---|
| Generic JSON | Array of alert objects or NDJSON | Any field schema; sift normalizes automatically |
| Splunk export | JSON export from Splunk Search | Handles `results` wrapper and Splunk field names |
| CSV | Comma-separated alert rows | First row treated as header; all fields extracted |

Pass `-` as the filename to read from stdin:
```bash
splunk-cli export | sift triage -
```

---

## LLM Providers

| Provider | Extra | Environment Variable | Notes |
|---|---|---|---|
| `template` | *(none)* | — | Default; no LLM required |
| `mock` | *(none)* | — | Deterministic mock output for testing and CI |
| `anthropic` | `[llm]` | `ANTHROPIC_API_KEY` | Claude via Anthropic API |
| `openai` | `[llm]` | `OPENAI_API_KEY` | GPT via OpenAI API |
| `ollama` | *(none)* | `SIFT_OLLAMA_URL` (optional) | Local inference; defaults to `http://localhost:11434` |

Set the default provider in `~/.sift/config.yaml` or via the `SIFT_PROVIDER` environment variable.

---

## Enrichment (barb + vex)

The `--enrich` flag enriches extracted IOCs using the sister tools:

| Tool | PyPI | What it does | Required |
|------|------|-------------|----------|
| barb | `barb-phish` | Heuristic phishing URL analysis | No (local) |
| vex  | `vex-ioc`    | VirusTotal IOC reputation lookup | API key via `VT_API_KEY` |

```bash
# Install enrichment extras
pip install "sift-triage[enrich]"

# Run with enrichment
sift triage alerts.json --enrich

# Barb only (no API key needed)
sift triage alerts.json --enrich --enrich-mode barb

# Skip consent prompt
sift triage alerts.json --enrich --yes
```

sift limits enrichment to 20 IOCs per run to avoid API rate limits.

---

## Output Formats

| Flag | Output |
|---|---|
| `rich` (default) | Color-coded cluster table in the terminal |
| `console` | Plain-text output, safe for logging |
| `json` | Structured JSON with all cluster and IOC data |
| `csv` | Flat CSV suitable for SIEM import or spreadsheets |
| `stix` | STIX 2.1 bundle JSON for threat intelligence platforms |

Use `-f` / `--format` to select output format, and `-o` / `--output` to write to a file.

---

## Advanced Usage

### Alert Filtering

Use `--filter` to apply a boolean DSL to the cluster list after triage. Only matching clusters are included in the output.

```bash
# Only HIGH and CRITICAL clusters
sift triage alerts.json --filter 'priority >= HIGH'

# Malware or phishing clusters with more than 3 IOCs
sift triage alerts.json --filter 'category IN (malware, phishing) AND ioc_count > 3'

# Exclude low-signal categories
sift triage alerts.json --filter 'NOT category IN (false_positive)'

# Combine priority and alert count conditions
sift triage alerts.json --filter 'priority >= MEDIUM AND alert_count >= 5'
```

Supported fields: `priority`, `category`, `ioc_count`, `alert_count`.
Supported operators: `>=`, `<=`, `>`, `<`, `=`, `IN (...)`, `NOT`, `AND`, `OR`.

### Result Caching

Use `--cache` to cache triage results by SHA-256 fingerprint of the input. Repeated runs over the same input return instantly from the cache (1-hour TTL, stored in `~/.sift/cache/`).

```bash
# First run: processes and caches the result
sift triage alerts.json --cache

# Subsequent runs with the same file: returns from cache
sift triage alerts.json --cache

# Combine with other flags; cache stores the full triage output
sift triage alerts.json --cache --summarize --provider anthropic
```

### STIX 2.1 Export Pipeline

Export triage results as a STIX 2.1 threat intelligence bundle for ingestion into SIEM or TIP platforms.

```bash
# Export to STIX bundle file
sift triage alerts.json -f stix -o bundle.json

# Combined enrichment and STIX export
sift triage alerts.json --enrich -f stix -o enriched_bundle.json

# Pipe STIX output to another tool
sift triage alerts.json -f stix | jq '.objects | length'
```

### Max Clusters

Limit the number of clusters returned by the pipeline using `max_clusters` in `~/.sift/config.yaml`. When the cluster count exceeds the limit, only the highest-priority clusters are retained. This is useful for large alert volumes where downstream tooling has per-report limits.

```yaml
clustering:
  max_clusters: 50
```

---

## Metrics

The `sift metrics` command runs the full normalization, dedup, and clustering pipeline over an alert file and displays summary statistics without generating a triage report.

```bash
sift metrics alerts.json
```

Output includes:
- Total cluster count and alert count
- Average cluster size
- Top alert categories by frequency
- IOC type distribution (IPs, domains, hashes, URLs)
- AI summary success rate (if summaries were previously generated)

```bash
# Skip deduplication for raw counts
sift metrics alerts.json --no-dedup

# Use a custom config file
sift metrics alerts.json --config /path/to/config.yaml
```

---

## Validation and Security

sift validates all LLM outputs against a strict JSON schema (`--validate-only` runs parse and validate only, then exits):

```bash
# Validate parsed structure without rendering output
sift triage alerts.json --validate-only
```

A built-in prompt injection detector scans LLM inputs for five pattern categories: instruction overrides, output manipulation, JSON escapes, encoded payloads, and shell injection. Suspicious content is flagged and summarization falls back to the template provider automatically.

---

## Exit Codes

| Code | Meaning |
|---|---|
| `0` | Triage complete — no HIGH or CRITICAL clusters found |
| `1` | Triage complete — one or more HIGH or CRITICAL clusters found |
| `2` | Error — invalid input, configuration failure, or LLM error |

Exit code `1` is designed for use in CI pipelines and automated response playbooks.

---

## Configuration

```bash
sift config --show    # display current configuration
sift doctor           # verify config, LLM connectivity, and dependencies
```

Configuration is resolved in priority order: CLI flags > environment variables > `~/.sift/config.yaml` > defaults.

---

## Part of the SOC Trilogy

| Tool | Role | PyPI |
|---|---|---|
| [barb](https://github.com/duathron/barb) | Heuristic phishing URL analyzer | `barb-phish` |
| [vex](https://github.com/duathron/vex) | VirusTotal IOC enrichment | `vex-ioc` |
| **sift** | Alert triage summarizer | `sift-triage` |

---

## License

MIT — see [LICENSE](LICENSE) for details.

Author: Christian Huhn
