Metadata-Version: 2.4
Name: wordforge
Version: 0.3.0
Summary: Wordlists forged for your target, not for everyone's. Hyper-contextual wordlist generation for offensive security.
Project-URL: Homepage, https://github.com/Ataraxia-ia-labs/WordForge
Project-URL: Repository, https://github.com/Ataraxia-ia-labs/WordForge
Project-URL: Issues, https://github.com/Ataraxia-ia-labs/WordForge/issues
Project-URL: Changelog, https://github.com/Ataraxia-ia-labs/WordForge/blob/main/CHANGELOG.md
Author: Ataraxia-ia-labs
License: AGPL-3.0-or-later
License-File: LICENSE
Keywords: bug-bounty,fuzzing,osint,pentest,red-team,security,wordlist
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Requires-Python: >=3.12
Requires-Dist: aiosqlite>=0.20.0
Requires-Dist: anthropic>=0.39.0
Requires-Dist: anyio>=4.6.0
Requires-Dist: diskcache>=5.6.3
Requires-Dist: dnspython>=2.7.0
Requires-Dist: fastapi>=0.115.0
Requires-Dist: httpx>=0.27.2
Requires-Dist: jinja2>=3.1.4
Requires-Dist: openai>=1.54.0
Requires-Dist: pydantic-settings>=2.6.0
Requires-Dist: pydantic>=2.9.0
Requires-Dist: python-multipart>=0.0.12
Requires-Dist: rich>=13.9.0
Requires-Dist: selectolax>=0.3.27
Requires-Dist: spacy>=3.8.0
Requires-Dist: sqlmodel>=0.0.22
Requires-Dist: structlog>=24.4.0
Requires-Dist: tldextract>=5.1.3
Requires-Dist: trafilatura>=1.12.2
Requires-Dist: typer>=0.13.0
Requires-Dist: uvicorn[standard]>=0.32.0
Provides-Extra: dev
Requires-Dist: detect-secrets>=1.5.0; extra == 'dev'
Requires-Dist: mypy>=1.13.0; extra == 'dev'
Requires-Dist: pre-commit>=4.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-cov>=6.0.0; extra == 'dev'
Requires-Dist: pytest-httpx>=0.33.0; extra == 'dev'
Requires-Dist: pytest>=8.3.3; extra == 'dev'
Requires-Dist: ruff>=0.7.0; extra == 'dev'
Requires-Dist: types-python-dateutil>=2.9.0.20241003; extra == 'dev'
Description-Content-Type: text/markdown

<h1 align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="images/logo-dark-bg.png">
    <source media="(prefers-color-scheme: light)" srcset="images/logo-512.png">
    <img alt="WordForge" src="images/logo-512.png" width="200">
  </picture>
</h1>

<h3 align="center">WordForge</h3>
<p align="center"><em>Wordlists forged for your target, not for everyone's.</em></p>

<p align="center">
  <a href="https://github.com/Ataraxia-ia-labs/WordForge/actions/workflows/ci.yml"><img alt="CI" src="https://github.com/Ataraxia-ia-labs/WordForge/actions/workflows/ci.yml/badge.svg"></a>
  <a href="LICENSE"><img alt="License: AGPL-3.0" src="https://img.shields.io/badge/license-AGPL--3.0-blue.svg"></a>
  <img alt="Python 3.12+" src="https://img.shields.io/badge/python-3.12+-blue.svg">
  <img alt="Coverage" src="https://img.shields.io/badge/coverage-74%25-green.svg">
  <img alt="Type-checked" src="https://img.shields.io/badge/mypy-strict-blue.svg">
  <a href="https://github.com/Ataraxia-ia-labs/WordForge/stargazers"><img alt="Stars" src="https://img.shields.io/github/stars/Ataraxia-ia-labs/WordForge?style=social"></a>
</p>

---

## Why WordForge?

Generic wordlists like `rockyou`, `SecLists/discovery`, or `common.txt` are
noisy against specific targets. Throwing 200,000 generic passwords at an
enterprise login is inefficient and loud.

**WordForge generates hyper-contextual wordlists from passive OSINT** of your
target: their website, GitHub orgs, public docs, employee profiles, and
historical content. The result is wordlists that are an order of magnitude
more relevant than generic ones — usernames realistic to the company,
directory paths matching the actual stack, parameters matching internal
naming, and subdomains rooted in real codenames.

Designed as a companion to [SubSift](https://github.com/Ataraxia-ia-labs/SubSift):
same stack, same philosophy, pipe-friendly.

## Features

- **Passive OSINT collection** — website crawler, GitHub org metadata,
  Wayback Machine, DNS, response headers — all async, rate-limited, polite.
- **NER + pattern extraction** — spaCy identifies people, organizations,
  products, locations; heuristics surface internal jargon and codenames.
- **Multi-provider LLM** — Ollama (local, free), Anthropic (premium), or
  OpenAI. Switch at runtime from the UI or CLI.
- **HashCat-style mutations** — capitalization, leet, year suffixes, role
  combinations, configurable rules.
- **Categorized output** — usernames, passwords, paths, subdomains,
  parameters, emails, company name variants.
- **Pipe-friendly** — `wordforge generate ... | subsift scan`.
- **Live dashboard** — the web UI streams per-stage progress over SSE
  (collect → extract → generate → mutate), then renders a tabbed result
  card with copy / download / pipe-to-SubSift actions.
- **Operator seeds** — `--seed employees.txt` folds LinkedIn-derived names
  straight into the LLM context and the offline fallback.
- **Self-diagnostic** — `wordforge doctor` reports DNS, HTTPS, LLM provider,
  spaCy model, cache, and DB readiness in one shot.
- **One-command Docker** — `docker compose up` and you're scanning.

## LLM providers

WordForge supports three LLM providers. Choose the trade-off that fits.

| Provider | Privacy | Cost | Quality | Setup |
|----------|:-------:|:----:|:-------:|-------|
| **Ollama** (default) | 🟢 Local-only | 🟢 Free | 🟡 Good | Run `ollama serve`; default model `llama3.1:8b` |
| **Anthropic Claude** | 🔴 API call | 🟡 Pay-per-token | 🟢 Excellent | Set `ANTHROPIC_API_KEY`; default model `claude-sonnet-4-5` |
| **OpenAI** | 🔴 API call | 🟡 Pay-per-token | 🟢 Excellent | Set `OPENAI_API_KEY`; default model `gpt-4o-mini` |

Set `WORDFORGE_LLM_PROVIDER=ollama|anthropic|openai` in `.env`, or switch on
the fly with `--provider` on the CLI, or click the provider icon in the web UI.

## Quickstart

Install as a tool from PyPI (easiest — nothing to clone):

```bash
uv tool install wordforge      # or: pipx install wordforge
wordforge models download      # one-time: fetch the spaCy NER model
wordforge doctor
wordforge generate example.com
```

Or run the whole stack with Docker:

```bash
git clone https://github.com/Ataraxia-ia-labs/WordForge.git
cd WordForge
cp .env.example .env       # edit if you want non-default provider
docker compose up --build  # web UI on http://localhost:8001
```

Or from source with uv (the spaCy NER model installs automatically via `uv sync`):

```bash
uv sync --extra dev
uv run wordforge doctor
uv run wordforge generate example.com --provider ollama
```

> New here? The **[full Usage Guide](docs/USAGE.md)** walks through install,
> provider setup, your first wordlist, and feeding the output into ffuf /
> hydra / hashcat / SubSift.

## Usage

### CLI

```bash
# First time? Check your setup is healthy.
wordforge doctor

# Generate all categories for a target
wordforge generate example.com

# Pipe subdomains directly to SubSift
wordforge generate example.com --format subdomains | \
  subsift scan --wordlist - example.com

# Seed with employees you've already collected (LinkedIn export, etc.)
wordforge generate example.com --seed employees.txt

# Choose provider per run
wordforge generate example.com --provider anthropic

# Export to a ZIP bundle
wordforge generate example.com --format zip --output bundle.zip

# Apply a hashcat rule file to the password candidates
wordforge generate example.com --rules best64.rule

# Run a whole list of targets (one per line); failures don't abort the batch
wordforge generate-batch targets.txt

# Compare two runs (or two snapshots of the same target over time)
wordforge diff out/example.com.old out/example.com --show

# Browse past generations
wordforge list
```

### Web UI

Open <http://localhost:8001>, enter a target, pick a provider from the
selector, click **Forge**. Stream results in real time, download per
category, or grab the ZIP bundle.

## Integration with SubSift

```bash
wordforge generate target.com --format subdomains | \
  subsift scan --wordlist - target.com
```

WordForge detects pipes automatically: when stdout is not a TTY, the banner
and logs are suppressed, only data goes to stdout.

## Configuration

See [.env.example](.env.example) for the complete list. Key variables:

| Variable | Default | Description |
|----------|---------|-------------|
| `WORDFORGE_PORT` | `8001` | Web UI / API port |
| `WORDFORGE_LLM_PROVIDER` | `ollama` | `ollama`, `anthropic`, `openai` |
| `OLLAMA_HOST` | `http://localhost:11434` | Ollama endpoint |
| `OLLAMA_MODEL` | `llama3.1:8b` | Ollama model |
| `ANTHROPIC_MODEL` | `claude-sonnet-4-5` | Anthropic model |
| `OPENAI_MODEL` | `gpt-4o-mini` | OpenAI model |
| `WORDFORGE_RATE_LIMIT_PER_HOST` | `1.0` | Requests/sec per hostname |
| `WORDFORGE_CRAWL_MAX_DEPTH` | `2` | Crawler depth |

## Architecture

```mermaid
flowchart LR
    A[Target] --> B[Collectors]
    B -->|Website| C[Extractors]
    B -->|GitHub| C
    B -->|Wayback| C
    B -->|DNS| C
    C -->|NER + Patterns| D[LLM Provider]
    D -->|Ollama / Claude / OpenAI| E[Generators]
    E --> F[Mutators]
    F --> G[Exporters]
    G -->|txt / json / zip| H[Wordlists]
```

## Roadmap

**Shipped in v0.1.0**
- [x] Async pipeline with 4 collectors (Website BFS+robots, DNS, Wayback CDX, GitHub REST)
- [x] LLM-driven generators with cached prompts (Ollama / Anthropic / OpenAI)
- [x] HashCat-style rule engine + case/leet/year/suffix mutators
- [x] HTMX dashboard with provider selector + recent-runs panel
- [x] Runtime provider switching from the dashboard
- [x] Pipe-friendly integration with SubSift
- [x] `wordforge doctor` self-diagnostic
- [x] `--seed` flag for operator-supplied seed lists (e.g. LinkedIn names)
- [x] SQLite-backed history (`wordforge list`)

**Shipped in v0.2.0**
- [x] SSE-streamed live progress in the dashboard (per-stage updates)

**Shipped (unreleased)**
- [x] PyPI distribution (`uv tool install wordforge` / `pipx`) + automated tag releases
- [x] Multi-target batch mode (`wordforge generate-batch targets.txt`)
- [x] Run-diff: compare two run outputs (`wordforge diff a/ b/`)
- [x] Hashcat ruleset import (`generate --rules best64.rule`)

**Planned for v0.3**
- [ ] Optional API auth (HMAC-signed bearer for `/api/generate`)
- [ ] Prometheus `/metrics` endpoint
- [ ] Burp Suite extension (separate repo)
- [ ] Plugin API for custom collectors

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). Issues and PRs welcome.

## Disclaimer

WordForge is **for authorized security testing only**. Read [DISCLAIMER.md](DISCLAIMER.md)
before use. Unauthorized scanning may violate computer fraud laws.

## License

[AGPL-3.0-or-later](LICENSE). If you run a modified version as a network
service, you must release your modifications under the same license.

## Acknowledgements

Built on the shoulders of: [FastAPI](https://fastapi.tiangolo.com),
[Typer](https://typer.tiangolo.com), [httpx](https://www.python-httpx.org),
[trafilatura](https://trafilatura.readthedocs.io), [spaCy](https://spacy.io),
[Ollama](https://ollama.com), and the broader
[ProjectDiscovery](https://projectdiscovery.io) ecosystem that inspires the
pipe-friendly philosophy.
