Metadata-Version: 2.4
Name: huntertrace
Version: 1.2.0
Summary: Advanced phishing actor attribution using Bayesian inference and graph analysis
Home-page: https://github.com/akshaydotweb/huntertrace
Author: Akshay V
Author-email: HUNTЕРТRACE Contributors <akshayvmudaliar@gmail.com>
Maintainer-email: HUNTЕРТRACE Contributors <akshayvmudaliar@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/akshaydotweb/huntertrace
Project-URL: Documentation, https://github.com/akshaydotweb/huntertrace/blob/main/docs
Project-URL: Repository, https://github.com/akshaydotweb/huntertrace
Project-URL: Issues, https://github.com/akshaydotweb/huntertrace/issues
Project-URL: Changelog, https://github.com/akshaydotweb/huntertrace/blob/main/CHANGELOG.md
Keywords: phishing,attribution,cybersecurity,forensics,email-analysis,threat-intelligence
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Security
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: networkx>=2.6
Requires-Dist: numpy>=1.20.0
Requires-Dist: requests>=2.25.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.12; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.9; extra == "dev"
Requires-Dist: mypy>=0.900; extra == "dev"
Requires-Dist: build>=0.7; extra == "dev"
Requires-Dist: twine>=3.4; extra == "dev"
Provides-Extra: graph
Requires-Dist: python-louvain>=0.15; extra == "graph"
Provides-Extra: whois
Requires-Dist: python-whois>=0.7.3; extra == "whois"
Provides-Extra: all
Requires-Dist: python-louvain>=0.15; extra == "all"
Requires-Dist: python-whois>=0.7.3; extra == "all"
Requires-Dist: matplotlib>=3.3.0; extra == "all"
Requires-Dist: tqdm>=4.60.0; extra == "all"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: platform
Dynamic: requires-python

# HUNTERTRACE

<p align="center">
  <img src="assets/hunterTraceLogo.png" alt="HunterTrace Logo" width="400">
</p>

> Advanced phishing actor attribution using multi-signal Bayesian inference and infrastructure graph analysis

[![PyPI version](https://badge.fury.io/py/huntertrace.svg)](https://badge.fury.io/py/huntertrace)
[![Python Versions](https://img.shields.io/pypi/pyversions/huntertrace.svg)](https://pypi.org/project/huntertrace/)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

## Overview

HUNTERTRACE is an open-source phishing attribution engine that identifies the
**geographic origin of phishing actors** through multi-signal Bayesian inference,
combining 8+ orthogonal signals to bypass VPN and proxy obfuscation. Evaluated
on 53 labeled emails, it achieves **52.8% country-level** and **56.6%
region-level** accuracy — outperforming single-signal methods — with larger-scale
validation ongoing.

Traditional email forensics relies on IP geolocation alone (~31% accuracy). HUNTERTRACE fuses **8+ orthogonal signals** through Bayesian inference:

| Signal | Source | VPN-Resistant |
|--------|--------|:---:|
| Webmail IP leaks | X-Originating-IP, X-Sender-IP headers | Yes |
| Timezone offset | Date header / Received chain | Yes |
| Language fingerprint | Content-Type charset, Subject encoding | Yes |
| Infrastructure reuse | Graph centrality across campaigns | Yes |
| Hop chain forgery | Received header consistency | Partial |
| VPN exit node mapping | ASN + hosting provider classification | N/A |
| SPF/DKIM/DMARC | Authentication results | Partial |
| Webmail provider | Header fingerprinting (Gmail/Yahoo/Outlook) | Yes |

## Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    HUNTERTRACE PIPELINE                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Stage 1: Header Extraction (RFC 2822 parsing)              │
│      ↓                                                      │
│  Webmail IP Leak Detection (X-Originating-IP extraction)    │
│      ↓                                                      │
│  Stage 2: IP Classification (VPN/Tor/Proxy/Residential)     │
│      ↓                                                      │
│  Stage 3A: Enrichment (WHOIS, ASN, hosting provider)        │
│      ↓                                                      │
│  VPN Backtrack Analysis (12 bypass techniques)              │
│      ↓                                                      │
│  Real IP Extraction (strips proxy layers)                   │
│      ↓                                                      │
│  Stage 3B: Threat Intelligence                              │
│  Stage 3C: Correlation Analysis                             │
│      ↓                                                      │
│  Stage 4: Geolocation (city-level, IPv4 + IPv6)             │
│      ↓                                                      │
│  Stage 5: Attribution Analysis (evidence packaging)         │
│      ↓                                                      │
│  Bayesian Multi-Signal Fusion (ACI confidence scoring)      │
│      ↓                                                      │
│  Sender Classification (hop forgery + timezone analysis)    │
│      ↓                                                      │
│  Output: JSON report + text summary + attack graph HTML     │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

## Quick Start

### Installation

```bash
pip install huntertrace
```

### Python API

```python
from huntertrace import HunterTrace

# Run the full 7-stage pipeline
pipeline = HunterTrace(verbose=True)
result = pipeline.run("phishing.eml")

# Generate text report
report = result.generate_report()
print(report.generate_text_report())

# Access Bayesian attribution
bayes = result.bayesian_attribution
if bayes:
    print(f"Region: {bayes.primary_region}")
    print(f"Confidence: {bayes.aci_adjusted_prob:.1%}")
    print(f"Tier: {bayes.tier} — {bayes.tier_label}")
```

### Command Line

```bash
# Single email analysis
huntertrace analyze phishing.eml --verbose

# Batch processing
huntertrace batch emails/ -o results/

# Campaign correlation (cross-email actor linking)
huntertrace campaign emails/ -o campaign_report/
```

## Performance

Evaluated on a labeled corpus of 53 phishing emails with known ground-truth origins:

| Method | Top-1 Country Accuracy | Notes |
|--------|------------------------|-------|
| IP Geolocation Only | ~31% | Industry baseline |
| Timezone Only | ~52% | VPN-resistant, coarse |
| **HUNTERTRACE (Bayesian)** | **52.8%** | Multi-signal fusion |
| **HUNTERTRACE (+ Graph)** | **56.6%** | Region-level accuracy |

**95% Confidence Interval**: 39.7% – 65.6% (n=53)  
**Webmail IP Leak Rate**: 37.7% of analyzed emails  
**Coverage**: 100% (no failed predictions)

> ⚠️ **Note**: Performance numbers are based on an initial corpus of 53 labeled
> emails. Larger-scale validation is in progress. Region-level accuracy (56.6%)
> is more reliable than country-level given current corpus size.

## ✨ Key Features

- 🎯 Multi-Signal Attribution (8+ signals)
- 🔓 VPN Bypass (webmail leaks, timezone)
- 🕸️ Graph Analysis (infrastructure reuse)
- 📊 Bayesian Fusion (probabilistic)

## 🚀 Quick Start

```bash
git clone https://github.com/akshaydotweb/HunterTrace.git
cd HunterTrace
pip install -r requirements.txt

# Analyze email
python hunterTrace.py analyze phishing.eml
```

## 📖 Documentation

- [Technical Summary](docs/HUNTERTRACE_Technical_Summary.md)
- [Installation Guide](docs/INSTALLATION.md)
- [API Documentation](docs/API.md)

## 🔬 Evaluation

**Dataset**: 53 labeled phishing emails  
**Methodology**: Manual OSINT labeling with ground truth
- Top-1 Country Accuracy: 52.8%
- Top-1 Region Accuracy: 56.6%
- 95% Confidence Interval: 39.7% – 65.6%
- Webmail Leak Rate: 37.7%
- Macro F1: 0.37

See [evaluation/](evaluation/) for full results.

## 🎓 Citation

```bibtex
@software{huntertrace2026,
  author = {[Your Name]},
  title = {HUNTERTRACE: Multi-Signal Phishing Attribution},
  year = {2026},
  url = {https://github.com/akshaydotweb/HunterTrace}
}
```

## 📄 License

MIT License - See [LICENSE](LICENSE)

---

**Black Hat Arsenal 2026 Submission**
