Metadata-Version: 2.4
Name: surfacemap
Version: 2.0.1
Summary: LLM-driven attack surface discovery — find every asset from just a company name
Project-URL: Homepage, https://github.com/BreachLine/surfacemap
Project-URL: Repository, https://github.com/BreachLine/surfacemap
Project-URL: Issues, https://github.com/BreachLine/surfacemap/issues
Author: Yash Korat
License-Expression: MIT
License-File: LICENSE
Keywords: attack-surface,discovery,osint,recon,security
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Requires-Python: >=3.11
Requires-Dist: aiosqlite>=0.20.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: rich>=13.7.0
Requires-Dist: typer[all]>=0.9.0
Provides-Extra: all
Requires-Dist: anthropic>=0.40.0; extra == 'all'
Requires-Dist: fastapi>=0.115.0; extra == 'all'
Requires-Dist: google-genai>=1.0.0; extra == 'all'
Requires-Dist: slack-sdk>=3.30.0; extra == 'all'
Requires-Dist: uvicorn>=0.30.0; extra == 'all'
Provides-Extra: api
Requires-Dist: fastapi>=0.115.0; extra == 'api'
Requires-Dist: uvicorn>=0.30.0; extra == 'api'
Provides-Extra: llm
Requires-Dist: anthropic>=0.40.0; extra == 'llm'
Requires-Dist: google-genai>=1.0.0; extra == 'llm'
Provides-Extra: notifications
Requires-Dist: slack-sdk>=3.30.0; extra == 'notifications'
Description-Content-Type: text/markdown

# SurfaceMap

**LLM-driven attack surface discovery. Find every external asset from just a company name.**

SurfaceMap combines passive OSINT techniques, DNS enumeration, HTTP probing, port scanning, cloud bucket enumeration, and LLM intelligence to build a complete map of an organization's attack surface.

---

## Quick Start

```bash
# Install
pip install surfacemap[all]

# Set your LLM API key
export GEMINI_API_KEY="your-key-here"

# Discover everything about a company
surfacemap discover "Acme Corp" --domain acme.com --tree --json

# Or just scan a domain
surfacemap discover example.com --mindmap
```

## Installation

```bash
# Core (CLI + discovery)
pip install surfacemap

# With API server
pip install surfacemap[api]

# With LLM intelligence
pip install surfacemap[llm]

# With Slack notifications
pip install surfacemap[notifications]

# Everything
pip install surfacemap[all]
```

### Install from Source

```bash
git clone https://github.com/BreachLine/surfacemap.git
cd surfacemap
pip install -e ".[all]"
```

### External Tools (Optional)

SurfaceMap works without these, but they enhance discovery:

| Tool | Purpose | Install |
|------|---------|---------|
| `dig` | DNS record enumeration | Included with most OS |
| `nmap` | Port scanning | `brew install nmap` / `apt install nmap` |
| `subfinder` | Passive subdomain enum | `go install github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest` |

## CLI Usage

```bash
# Full discovery with tree output
surfacemap discover "Google" --domain google.com --tree

# Export to JSON and CSV
surfacemap discover example.com --json --csv --output ./results

# Generate interactive HTML mindmap
surfacemap discover "Acme Corp" -d acme.com --mindmap

# Passive recon only (no active probing)
surfacemap discover example.com --passive-only --tree

# Enable enrichment (requires VirusTotal/Shodan/GitHub keys)
surfacemap discover example.com --enrich --json

# Skip LLM analysis phase
surfacemap discover example.com --no-analysis --tree
```

### Discover Options

| Flag | Short | Description |
|------|-------|-------------|
| `--domain` | `-d` | Primary domain (if target is a company name) |
| `--output` | `-o` | Output directory for results |
| `--tree` | `-t` | Display results as a rich tree in terminal |
| `--mindmap` | `-m` | Generate interactive D3.js HTML mindmap |
| `--json` | `-j` | Export results to JSON |
| `--csv` | | Export results to CSV |
| `--enrich` | `-e` | Enable enrichment modules (VirusTotal, Shodan, GitHub) |
| `--passive-only` | | Skip active probing (passive recon only) |
| `--no-analysis` | | Skip LLM analysis phase (risk scoring, attack paths) |

### Other Commands

```bash
# Show version and check for updates
surfacemap version

# Update to latest version
surfacemap update

# View all configuration settings
surfacemap config

# Change a configuration setting
surfacemap set-config SURFACEMAP_LLM_MODEL gemini-2.0-flash

# Set an API key
surfacemap set-key GEMINI_API_KEY your-key-here

# Show configured API keys
surfacemap show-keys
```

## API Server

```bash
# Install with API support
pip install surfacemap[api]

# Start the API server
uvicorn surfacemap.api.server:app --host 0.0.0.0 --port 8000

# Start a scan
curl -X POST "http://localhost:8000/discover?target=example.com"

# Get scan results
curl "http://localhost:8000/scans/{scan_id}"

# Health check
curl "http://localhost:8000/health"
```

### API Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/discover` | Start a new discovery scan |
| `GET` | `/scans/{id}` | Get scan results by ID |
| `GET` | `/scans` | List recent scans |
| `GET` | `/health` | Health check |

## Discovery Pipeline

SurfaceMap runs discovery in 4 phases:

| Phase | Description | Modules |
|-------|-------------|---------|
| **Phase 0: LLM Brainstorm** | AI identifies subsidiaries, domains, infrastructure, tech stack | LLM Brain |
| **Phase 1: Passive Recon** | DNS, subdomains, WHOIS, cert transparency, OSINT sources | DNS Records, Subdomain Enum (subfinder + crt.sh + brute force + LLM), WHOIS/RDAP, ASN Discovery, Reverse DNS, Zone Transfer, Email Security (SPF/DKIM/DMARC), Cert Transparency, Wayback Machine, URLScan, HackerTarget, RapidDNS, CommonCrawl, AnubisDB, CertSpotter, SubdomainCenter, AlienVault OTX |
| **Phase 2: Active Probing** | HTTP probing, port scanning, vulnerability checks | HTTP Probe + Tech/CDN/WAF Detection, Port Scan (nmap), SSL/TLS Analysis, Sensitive Path Fuzzing (60+ paths), JS Analysis, CORS Check, Cookie Security, Cloud Bucket Enum (S3/Azure/GCS), Subdomain Takeover (30 providers), Shodan InternetDB, Reverse IP, IP Geolocation |
| **Phase 3: LLM Analysis** | Risk scoring, attack paths, executive summary | False Positive Filtering, Risk Scoring (A-F grade), Attack Path Analysis, Executive Summary, Google Dork Generation |

### Asset Types

| Type | Description |
|------|-------------|
| `domain` | Root domains |
| `subdomain` | Discovered subdomains |
| `ip` | IP addresses |
| `ip_range` | IP CIDR ranges |
| `port` | Open ports |
| `service` | Running services with version info |
| `asn` | Autonomous system numbers |
| `cloud_bucket` | S3, Azure Blob, GCS buckets |
| `email_server` | MX record mail servers |
| `email` | Email addresses |
| `nameserver` | NS record nameservers |
| `cdn` | Content delivery networks |
| `waf` | Web application firewalls |
| `certificate` | TLS/SSL certificates |
| `dns_issue` | DNS misconfigurations (SPF/DKIM/DMARC) |
| `github_repo` | GitHub repositories |
| `social_media` | Social media profiles |
| `url` | Discovered URLs |
| `technology` | Detected technologies |
| `subsidiary` | Subsidiaries and acquisitions |
| `whois_record` | WHOIS registration data |
| `sensitive_file` | Exposed sensitive files |
| `api_endpoint` | Discovered API endpoints |
| `secret_leak` | Leaked secrets in JS/HTML |
| `cors_misconfiguration` | CORS misconfigurations |
| `cookie_issue` | Cookie security issues |

## Configuration

All settings are configured via environment variables or a `.env` file. Use `surfacemap config` to view all current settings.

### API Keys

| Variable | Description |
|----------|-------------|
| `GEMINI_API_KEY` | Google Gemini API key (recommended) |
| `ANTHROPIC_API_KEY` | Anthropic Claude API key |
| `OPENAI_API_KEY` | OpenAI API key |
| `VIRUSTOTAL_API_KEY` | VirusTotal enrichment (optional) |
| `SHODAN_API_KEY` | Shodan enrichment (optional) |
| `GITHUB_TOKEN` | GitHub dorking (optional) |
| `HUNTER_API_KEY` | Hunter.io email harvesting (optional) |

### LLM Settings

| Variable | Default | Description |
|----------|---------|-------------|
| `SURFACEMAP_LLM_PROVIDER` | `gemini` | LLM provider (`gemini`, `anthropic`, or `openai`) |
| `SURFACEMAP_LLM_MODEL` | `gemini-2.5-flash` | LLM model name |
| `SURFACEMAP_LLM_MAX_TOKENS` | `16384` | Max tokens per LLM response |
| `SURFACEMAP_LLM_TEMPERATURE` | `0.3` | LLM temperature |
| `SURFACEMAP_LLM_TIMEOUT` | `120` | LLM request timeout (seconds) |

### Timeouts

| Variable | Default | Description |
|----------|---------|-------------|
| `SURFACEMAP_HTTP_TIMEOUT` | `15` | HTTP probe timeout (seconds) |
| `SURFACEMAP_DNS_TIMEOUT` | `10` | DNS lookup timeout (seconds) |
| `SURFACEMAP_OSINT_TIMEOUT` | `60` | OSINT API timeout (seconds) |
| `SURFACEMAP_SCAN_TIMEOUT` | `300` | nmap scan timeout (seconds) |
| `SURFACEMAP_SSL_TIMEOUT` | `10` | SSL/TLS analysis timeout (seconds) |

### Concurrency & Limits

| Variable | Default | Description |
|----------|---------|-------------|
| `SURFACEMAP_MAX_PROBES` | `50` | Concurrent HTTP probes |
| `SURFACEMAP_MAX_DNS` | `200` | Concurrent DNS lookups |
| `SURFACEMAP_MAX_SUBDOMAINS` | `500` | Maximum subdomains to enumerate |
| `SURFACEMAP_MAX_EXTRA_DOMAINS` | `20` | Maximum subsidiary domains to scan |
| `SURFACEMAP_NMAP_ARGS` | `-sV -T4 --top-ports 100` | nmap arguments |

### Output & Notifications

| Variable | Default | Description |
|----------|---------|-------------|
| `SURFACEMAP_OUTPUT_DIR` | `./output` | Default output directory |
| `SURFACEMAP_DB_PATH` | `./surfacemap.db` | SQLite database path |
| `SURFACEMAP_SLACK_WEBHOOK` | | Slack webhook URL |
| `SURFACEMAP_SLACK_TOKEN` | | Slack Bot Token |
| `SURFACEMAP_SLACK_CHANNEL` | `#security` | Slack channel |

## Output Formats

- **Terminal Tree** — Rich tree display with color-coded statuses
- **JSON** — Full scan data with metadata
- **CSV** — Flat export for spreadsheet analysis
- **HTML Mindmap** — Interactive D3.js force-directed graph with dark theme, zoom, drag, and tooltips
- **Mermaid** — Mermaid.js mindmap diagram for embedding in docs

## Architecture

```
surfacemap/
  core/
    config.py        — Environment-based configuration
    models.py        — Asset, ScanResult, enums
    llm.py           — Multi-provider LLM integration (Gemini/Claude/OpenAI)
  discovery/
    base.py          — DiscoveryModule ABC
    engine.py        — 4-phase orchestration engine
    dns.py           — DNS, subdomain, takeover, cloud modules
    http.py          — HTTP probe, port scan modules
    web.py           — Wayback, URLScan, RapidDNS, CommonCrawl, AnubisDB, CertSpotter, Shodan InternetDB
    osint.py         — WHOIS, ASN, reverse DNS, SSL analysis, email security
    active.py        — Sensitive paths, JS analysis, CORS, cookie security
    enrichment.py    — VirusTotal, Shodan, GitHub dorks, email harvesting
  analysis/
    risk.py          — Risk scoring engine
    narrative.py     — Attack path and executive summary generation
  cli/
    main.py          — Typer CLI application
  output/
    mindmap.py       — D3.js HTML and Mermaid export
  api/
    server.py        — FastAPI REST API
  notifications/
    slack.py         — Slack Block Kit notifications
  storage/
    db.py            — SQLite persistence with aiosqlite
```

## License

MIT License. Copyright (c) 2026 Yash Korat.

---

Built by [BreachLine Labs](https://github.com/BreachLine)
