Metadata-Version: 2.4
Name: phishsage
Version: 2.1.0
Summary: Lightweight email triage and phishing-analysis toolkit. Extracts headers, attachments, and links, applies heuristic checks, and produces structured insights.
Author-email: Adams <208283706+0xlam@users.noreply.github.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/0xlam/PhishSage
Project-URL: Repository, https://github.com/0xlam/PhishSage
Project-URL: Issues, https://github.com/0xlam/PhishSage/issues
Keywords: phishing,email-security,SOC,SIEM,threat-intelligence,email-analysis,cybersecurity,phish-detection,incident-response
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: System Administrators
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Security
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mail-parser<5,>=4.0
Requires-Dist: beautifulsoup4<5,>=4.12
Requires-Dist: tldextract<6,>=5.0
Requires-Dist: python-dateutil<3,>=2.9
Requires-Dist: python-dotenv<2,>=1.0
Requires-Dist: python-whois<1,>=0.9
Requires-Dist: email_validator<3,>=2.0
Requires-Dist: idna<4,>=3.6
Requires-Dist: dnspython<3,>=2.6
Requires-Dist: aiohttp<4,>=3.9
Requires-Dist: aiodns<5,>=3.0
Requires-Dist: yarl<2,>=1.9
Provides-Extra: attachments
Requires-Dist: yara-python<5,>=4.5; extra == "attachments"
Requires-Dist: python-magic<1,>=0.4; extra == "attachments"
Requires-Dist: vt-py<1,>=0.18; extra == "attachments"
Provides-Extra: links
Requires-Dist: cryptography<45,>=43.0; extra == "links"
Requires-Dist: vt-py<1,>=0.18; extra == "links"
Provides-Extra: all
Requires-Dist: phishsage[attachments,links]; extra == "all"
Dynamic: license-file

# PhishSage

PhishSage is a lightweight phishing-analysis toolkit that parses raw emails, inspects headers, analyzes links and domains with multi-layer heuristics, and outputs structured JSON findings for fast, automated investigation

<!-- Badges go here -->

[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)]()
[![Python](https://img.shields.io/badge/Python-3.10%2B-blue.svg)]()
[![Status: Active](https://img.shields.io/badge/Project%20Status-Active-brightgreen.svg)]()


## 1. Core functionality

PhishSage is intentionally minimal and concentrates on these essential capabilities:

* **Header analysis**

  * Extracts normalized sender-related headers (From, Reply-To, Return-Path, Message-ID)
  * Parses SPF, DKIM, and DMARC results from Authentication-Results
  * Performs alignment checks across From, Reply-To, and Return-Path
  * Validates Message-ID domain consistency
  * Detects use of free email providers in Reply-To and Return-Path headers
  * Checks timestamp sanity by comparing the Date header with the first Received hop
  * Looks up WHOIS domain age and flags newly registered or soon-to-expire domains
  * Validates MX records for sender-related domains
  * Queries Spamhaus DBL for sender-related domains
  * Aggregates all findings into structured JSON with merged alerts


* **Attachment processing**

  * List attachments with MIME and size
  * Extract attachments safely (avoid overwrites)
  * Compute hashes (MD5, SHA1, SHA256)
  * Optional VirusTotal scan by SHA256
  * Scan attachments with YARA rules (single files, multiple files, or directories; recursive and filtered for valid .yar/.yara files)
  * Verbose mode shows matched strings with offsets and hex data


* **Link / URL analysis**

  * Extracts URLs from email bodies or headers
  * Detects URLs using raw IP addresses instead of domains
  * Flags suspicious or uncommon top-level domains (TLDs)
  * Identifies excessive or nested subdomains, ignoring trivial ones (e.g., "www")
  * Recognizes shortened URLs (bit.ly, tinyurl.com, etc.)
  * Calculates Shannon entropy for domain and subdomain to spot obfuscation
  * Performs SSL/TLS certificate inspection (issuer, validity, domain match, expiration)
  * Looks up domain age via WHOIS and flags newly registered or expiring domains
  * VirusTotal URL lookup for threat intelligence
  * Optional redirect-chain tracing to uncover hidden destinations
  * Checks for numeric-only registrable domains
  * Detects URLs using commonly abused web platforms and services
  * Flags URLs with excessively deep paths


## 2. Installation

### Base Install

Installs core functionality: header analysis and basic email parsing.
```bash
# From PyPI
pip install phishsage

# From GitHub
git clone https://github.com/0xlam/PhishSage.git
cd PhishSage
python3 -m venv venv

# Linux / macOS
source venv/bin/activate

# Windows (PowerShell)
venv\Scripts\Activate.ps1

pip install -e .
```

---

### Optional Extras

Install only what you need:
```bash
# Attachment analysis (YARA scanning, MIME detection)
pip install "phishsage[attachments]"

# Link / URL analysis
pip install "phishsage[links]"

# Everything
pip install "phishsage[all]"
```

---

### VirusTotal API Key

Required if using `--vt-scan` in any mode.
```bash
# Linux / macOS
export VIRUSTOTAL_API_KEY="your_virustotal_api_key"

# Windows (PowerShell)
setx VIRUSTOTAL_API_KEY "your_virustotal_api_key"
```


## 3. CLI Usage

PhishSage provides a command-line interface with three main modes: `headers`, `attachments`, and `links`. The `headers` and `links` modes output results in JSON format, while the `attachments` mode produces human-readable summaries only.


### Main Help

```bash
phishsage -h
```

**Output:**

```
usage: phishsage [-h] {headers,attachments,links} ...

PhishSage

positional arguments:
  {headers,attachments,links}
    headers             Analyze email headers for anomalies or indicators
    attachments         Analyze or extract attachments
    links               Analyze links in email content

options:
  -h, --help            show this help message and exit
```

---

### Header Analysis

```bash
phishsage headers -h
```

**Options:**

```
usage: phishsage headers [-h] -f FILE [--heuristics]
                         [--enrich [{mx,spamhaus,domain_age,all} ...]]
                         [--json]

options:
  -h, --help            show this help message and exit
  -f, --file FILE       Email file to analyze (.eml)
  --heuristics          Analyze headers for suspicious patterns and anomalies
  --enrich [{mx,spamhaus,domain_age,all} ...]
                        Add threat-intel enrichment to header analysis (mx,
                        spamhaus, domain_age). Requires --heuristics.
  --json                Output full details in JSON format
```

---

### Attachment Processing

```bash
phishsage attachments -h
```

**Options:**

```
usage: phishsage attachments [-h] -f FILE [--list] [--extract DIR] [--hash]
                             [--vt-scan] [--yara PATH [PATH ...]]
                             [--yara-verbose] [--json]

options:
  -h, --help            show this help message and exit
  -f, --file FILE       Email file to analyze (.eml)
  --list                List attachments only
  --extract DIR         Extract attachments to specified directory
  --hash                Compute hashes (MD5, SHA1, SHA256) for each attachment
  --vt-scan             Check attachments against VirusTotal by SHA256
  --yara PATH [PATH ...]
                        Scan attachments with YARA rules. Paths can be files
                        or directories; directories are scanned recursively
                        for .yar/.yara files.
  --yara-verbose        Show detailed string matches and offsets when YARA
                        rules hit
  --json                Output full details in JSON format
```

---

### Link / URL Analysis

```bash
phishsage links -h
```

**Options:**

```
usage: phishsage links [-h] -f FILE [--extract] [--vt-scan]
                       [--check-redirects] [--heuristics]
                       [--enrich [{all,domain_age,certificate,virustotal,redirects} ...]]
                       [--json]

options:
  -h, --help            show this help message and exit
  -f, --file FILE       Email file to analyze (.eml)
  --extract             Extract URLs from the email body
  --vt-scan             Query VirusTotal for URL reputation
  --check-redirects     Follow HTTP redirects and show chain
  --heuristics          Run phishing detection heuristics (use --enrich to add
                        extra data)
  --enrich [{all,domain_age,certificate,virustotal,redirects} ...]
                        Add extra analysis to heuristics (requires
                        --heuristics)
  --json                Output full details in JSON format
```

---

## 4. Configuration

PhishSage stores configuration values in the project config (`config.toml`) or environment variables. The main items you may safely adjust are:

  * `VIRUSTOTAL_API_KEY` — API key for VirusTotal scans.
  * `MAX_REDIRECTS` — Maximum number of redirects to follow when checking redirect chains.
  * `THRESHOLD_YOUNG`, `THRESHOLD_EXPIRING` — Domain age/expiry thresholds (in days). Domains younger than `THRESHOLD_YOUNG` or expiring within `THRESHOLD_EXPIRING` days are flagged as potentially suspicious.
  * `ABUSABLE_PLATFORM_DOMAINS`, `SUSPICIOUS_TLDS`, `SHORTENERS` — Heuristic lists used in URL/link analysis.
  * `SUBDOMAIN_THRESHOLD`, `TRIVIAL_SUBDOMAINS` — Used for subdomain heuristics to identify excessive or meaningful subdomains.
  * `FREE_EMAIL_DOMAINS` — Free email providers that may indicate disposable or less-trusted addresses.
  * `DATE_RECEIVED_DRIFT_MINUTES` — Maximum allowed difference between the `Date` header and the first `Received` hop in email headers.

 *Note: Only modify thresholds or heuristic lists if you understand the potential impact on false positives and overall detection accuracy.*


---

## 5. Scope & Limitations

  * **Focused functionality:** PhishSage is not a full mail forensic suite. It prioritizes heuristics, quick triage, and enrichment over deep forensic analysis.
  * **Network-dependent checks:** WHOIS, VirusTotal, MX, and SSL inspections rely on external services; results may vary or fail due to connectivity issues or API limits.
  * **Attachment processing:** Currently limited to listing, extraction, hashing, and optional VirusTotal scans. Full heuristic attachment analysis will be introduced in a future release.
  * **Output formats:** Human‑readable pretty output is the default. Use `--json` to obtain detailed structured data for all modes.
  * **Intended use:** Designed for investigative support and enrichment. Not intended for automated blocking or enforcement in production email systems.
  * **Evolving coverage:** Current checks under each section are limited; additional heuristics and enhanced analyses will be added in future releases.


---

## 6. Contributing

Contributions to PhishSage are welcome! You can help improve the project by:

* Adding or refining heuristic checks for headers, attachments, and links.
* Expanding the lists in `config.toml`.
* Improving parsing, normalization, or output handling.
* Reporting bugs or suggesting enhancements.

Before submitting changes, please ensure they are well-tested and maintain the code’s clarity, security, and reliability. Contributions that enhance detection coverage, reduce false positives, or improve usability are particularly appreciated.
