Metadata-Version: 2.4
Name: wfh-wordlist
Version: 2.6.2
Summary: WordList For Hacking — Unified wordlist generation toolkit for pentest and red team operations
Author-email: André Henrique <contact@safelabs.com.br>
Maintainer-email: André Henrique <contact@safelabs.com.br>
License-Expression: MIT
Project-URL: Homepage, https://github.com/mrhenrike/WordListsForHacking
Project-URL: Documentation, https://github.com/mrhenrike/WordListsForHacking/wiki
Project-URL: Repository, https://github.com/mrhenrike/WordListsForHacking
Project-URL: Issues, https://github.com/mrhenrike/WordListsForHacking/issues
Project-URL: Changelog, https://github.com/mrhenrike/WordListsForHacking/releases
Keywords: wordlist,password,pentest,red-team,security,brute-force,dictionary,hacking,cybersecurity,osint,credential,offensive-security,ics,scada,plc,hmi,iot,dns-fuzzing,password-dna,default-credentials,web-scraping,pcfg,markov,keyboard-walk,hashcat-rules,prince-attack,benchmark
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: OS Independent
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: colorama>=0.4.6
Requires-Dist: tqdm>=4.66.0
Requires-Dist: requests>=2.31.0
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: lxml>=4.9.0
Requires-Dist: chardet>=5.2.0
Requires-Dist: unidecode>=1.3.6
Provides-Extra: ocr
Requires-Dist: easyocr>=1.7.0; extra == "ocr"
Requires-Dist: Pillow>=10.0.0; extra == "ocr"
Provides-Extra: docs
Requires-Dist: openpyxl>=3.1.0; extra == "docs"
Requires-Dist: pdfplumber>=0.10.0; extra == "docs"
Requires-Dist: python-docx>=1.1.0; extra == "docs"
Requires-Dist: striprtf>=0.0.26; extra == "docs"
Provides-Extra: scrape
Requires-Dist: pypdf>=4.0.0; extra == "scrape"
Provides-Extra: full
Requires-Dist: wfh-wordlist[docs,ocr,scrape]; extra == "full"
Dynamic: license-file

# WordListsForHacking (WFH)

<p align="center">
  <img src="https://img.shields.io/github/stars/mrhenrike/WordListsForHacking?style=flat-square" alt="GitHub Stars">
  <img src="https://img.shields.io/github/license/mrhenrike/WordListsForHacking?style=flat-square" alt="License">
  <img src="https://img.shields.io/badge/version-2.6.1-blue?style=flat-square" alt="Version">
  <img src="https://img.shields.io/badge/python-3.8%2B-blue?style=flat-square&logo=python&logoColor=white" alt="Python 3.8+">
  <img src="https://img.shields.io/pypi/v/wfh-wordlist?style=flat-square&logo=pypi&logoColor=white&color=green" alt="PyPI">
</p>

**Unified wordlist generation toolkit for pentest and red team operations — 36 subcommands in a single CLI.** Charset/mask generation, personal & corporate target profiling, web scraping (JS/CSS/PDF extraction), OCR, document parsing (PDF/XLSX/DOCX), leet speak permutations, XOR crypto, DNS/subdomain fuzzing, phone number generation, corporate user enumeration, retail/pharmacy chain credential patterns, default credential databases (IoT/ICS/SCADA/PLC/HMI), ISP WiFi keyspace generation, password-DNA behavioral analysis, keyword combiner, word mangling, merge & sanitize, ML-based ranking with SecLists corpus training, statistical analysis, PCFG probabilistic grammar generation, OMEN-style Markov chain generation, keyboard walk generation, automatic hashcat rule generation, PRINCE combinatorial chaining, wordlist quality benchmarking, **phrase-initials acrostic generation, existing-password mutation engine, digit-to-text variants (EN/PT/BR/ES), global length filters, and disk-space safety checks.**

> **Full documentation:** [Wiki](https://github.com/mrhenrike/WordListsForHacking/wiki)

---

> **DISCLAIMER:** This tool is intended **exclusively for authorized security testing, penetration testing, and educational purposes**. Unauthorized use against systems you do not own or have explicit written permission to test is **illegal** and unethical. The author assumes no liability for misuse.

---

## Quick Start

### Install via pip (recommended)

```bash
pip install wfh-wordlist                # core (charset, profile, dns, scrape, analyze, ...)
pip install wfh-wordlist[docs]         # + PDF/XLSX/DOCX extraction
pip install wfh-wordlist[scrape]       # + PDF crawl during web scraping
pip install wfh-wordlist[ocr]          # + OCR (requires PyTorch)
pip install wfh-wordlist[full]         # all extras
```

Verify installation:

```bash
wfh --help                              # should show 25 subcommands
pip show wfh-wordlist                   # check version
```

### Or clone from source

```bash
git clone https://github.com/mrhenrike/WordListsForHacking.git
cd WordListsForHacking

# Linux / macOS / Termux
chmod +x setup_venv.sh && ./setup_venv.sh && source .venv/bin/activate

# Windows PowerShell
.\setup_venv.ps1; .\.venv\Scripts\Activate.ps1
```

### Run

```bash
wfh                        # interactive menu (pip install)
python wfh.py              # interactive menu (from source)
python wfh.py --help       # full CLI help
```

> **OS prerequisites (OCR only):** see the [Installation wiki page](https://github.com/mrhenrike/WordListsForHacking/wiki/Installation).

---

## Subcommands

| # | Command | Description |
|---|---------|-------------|
| 1 | `charset` | Charset/mask generation (crunch-style + hashcat masks) |
| 2 | `pattern` | Template-based generation with variables |
| 3 | `profile` | Personal target profiling (CUPP-style) |
| 4 | `corp` | Corporate target profiling |
| 5 | `corp-users` | Corporate domain user/password generation (50+ patterns) |
| 6 | `phone` | Phone number wordlists (BR, US, UK) |
| 7 | `scrape` | Web scraping (CeWL/CeWLeR-style) with JS/CSS/PDF extraction |
| 8 | `ocr` | OCR text extraction from images |
| 9 | `extract` | Extract words from PDF/XLSX/DOCX |
| 10 | `leet` | Leet speak permutations |
| 11 | `xor` | XOR encrypt/decrypt/brute-force |
| 12 | `analyze` | Statistical analysis (pipal-style) |
| 13 | `merge` | Merge & deduplicate wordlists |
| 14 | `dns` | DNS/subdomain fuzzing (alterx-style) |
| 15 | `pharma` | Healthcare/pharmacy credential patterns |
| 16 | `sanitize` | Clean & normalize wordlists |
| 17 | `reverse` | Reverse line order |
| 18 | `corp-prefixes` | Corporate prefix usernames (MSP/SOC/DevOps) |
| 19 | `train` | Train ML pattern model (local + SecLists corpus) |
| 20 | `sysinfo` | Hardware & compute info |
| 21 | `mangle` | Word mangling rules |
| 22 | `default-creds` | Query default credentials database (IoT/routers/printers/ICS) |
| 23 | `isp-keygen` | ISP default WiFi password keyspace generator |
| 24 | `combiner` | Keyword combiner (intelligence-wordlist-generator style) |
| 25 | `password-dna` | Analyze password patterns and generate behavioral variants |
| 26 | `pcfg` | PCFG probabilistic grammar — train and generate (Weir et al.) |
| 27 | `markov` | OMEN-style positional Markov chain generator |
| 28 | `kwalk` | Keyboard walk password generator (kwprocessor-style) |
| 29 | `rulegen` | Auto-generate hashcat .rule files from password analysis |
| 30 | `benchmark` | Wordlist quality benchmarking (MAYA-inspired metrics) |
| 31 | `prince` | PRINCE attack — chained element combination |
| 32 | `phrase` | Phrase-initials acrostic password generator (`@0x90` / hacker-suffix style) |
| 33 | `mutate` | Existing-password mutation engine (case / leet / prefix / suffix) |
| 34 | `pharma` | Retail/pharmacy chain credential patterns (brand+id, system+taxid, usernames) |
| 35 | `br-names` | Brazilian name-based username generator |
| 36 | `num2text` | Digit-to-text wordlist generator with case/leet/separator variants (EN/PT/BR/ES) |

> **Detailed syntax and examples for each subcommand:** [Wiki — Subcommands](https://github.com/mrhenrike/WordListsForHacking/wiki)

### Global Flags

```bash
python wfh.py --threads 20 --compute cuda --no-ml --min-len 8 --max-len 20 <subcommand>
```

| Flag | Default | Description |
|------|---------|-------------|
| `--threads N` | `5` | Thread count (1–300) |
| `--compute MODE` | `auto` | `auto` / `cpu` / `gpu` / `cuda` / `rocm` / `mps` / `hybrid` |
| `--no-ml` | off | Disable ML ranking |
| `--min-len N` | `0` | Global minimum word length filter (applied to all commands) |
| `--max-len N` | `0` | Global maximum word length filter (applied to all commands) |
| `-v` | off | Verbose logging |

---

## Common Usage Examples

### Corporate pentest — generate users + passwords

```bash
python wfh.py corp-users --domain acme.com.br --file employees.txt --passwords --combo -o acme_combo.lst
```

### Personal target profiling

```bash
python wfh.py profile --name "João Silva" --nick joao --birth 15/03/1990 --leet aggressive -o target.lst
```

### Charset with hashcat mask

```bash
python wfh.py charset 8 8 --mask "?u?l?l?l?d?d?d?s" -o passwords.lst
```

### Template-based patterns

```bash
python wfh.py pattern -t "{company}{year}!" --vars company=acme,globex year=2020-2026 -o patterns.lst
```

### DNS subdomain fuzzing

```bash
python wfh.py dns -d acme.com.br --words dev staging api admin portal -o subdomains.lst
```

### Analyze an existing wordlist

```bash
python wfh.py analyze passwords.lst --top 30 --masks --format json -o analysis.json
```

### Default credentials lookup

```bash
python wfh.py default-creds --list-vendors
python wfh.py default-creds --vendor mikrotik --format combo -o mikrotik_creds.lst
python wfh.py default-creds --protocol snmp --format user -o snmp_users.lst
```

### ISP WiFi keyspace generation

```bash
python wfh.py isp-keygen --list
python wfh.py isp-keygen --isp xfinity_comcast --estimate
python wfh.py isp-keygen --isp xfinity_comcast --limit 100000 -o xfinity.lst
```

### Web scraping with JS/CSS/PDF

```bash
python wfh.py scrape https://target.com --include-js --include-css --include-pdf --lowercase -o words.lst
python wfh.py scrape https://target.com --emails --output-emails emails.txt --output-urls urls.txt
python wfh.py scrape https://target.com --subdomain-strategy children --stream -o stream.lst
```

### Merge & sanitize

```bash
python wfh.py merge list1.lst list2.lst --min-len 6 --sort -o merged.lst
python wfh.py sanitize merged.lst --inplace
```

> **More examples and scenarios:** [Wiki — Quick Start](https://github.com/mrhenrike/WordListsForHacking/wiki/Quick-Start)

---

## Password DNA

Analyze password patterns and generate behavioral variants. The `password-dna` subcommand extracts structural "DNA" from known passwords (uppercase, lowercase, digit, symbol positions) and produces new candidates that follow the same behavioral patterns.

```bash
# Analyze a leaked/known password list and generate variants
python wfh.py password-dna --input known_passwords.lst --depth 2 -o dna_variants.lst

# Generate variants from a single seed with aggressive expansion
python wfh.py password-dna --seed "Company2024!" --depth 3 --leet -o seed_variants.lst

# DNA analysis report only (no generation)
python wfh.py password-dna --input known_passwords.lst --analyze-only --format json -o dna_report.json
```

---

## PCFG Grammar Engine

Train a Probabilistic Context-Free Grammar from a password corpus and generate candidates in **probability order** (most likely first). Based on Weir et al. (IEEE S&P 2009).

```bash
# Train a grammar from a password corpus
python wfh.py pcfg train --wordlist rockyou.txt

# Generate candidates (probability-ordered)
python wfh.py pcfg generate -o candidates.lst --limit 1000000

# Fine-tune with structure/terminal limits
python wfh.py pcfg generate --top-structures 50 --top-terminals 100 --min-len 8
```

## Markov Chain Generator

OMEN-style positional Markov chain generator. Learns per-position character transitions and generates in ascending cost order (most probable first).

```bash
# Train a Markov model (order 3)
python wfh.py markov train --wordlist leaked.txt --order 3

# Generate candidates with cost threshold
python wfh.py markov generate --min-len 6 --max-len 12 --max-cost 30 --limit 500000
```

## Keyboard Walk Generator

Generate passwords based on physical keyboard adjacency walks. Supports QWERTY, AZERTY, QWERTZ, Dvorak, and numpad layouts.

```bash
# Generate QWERTY walks (length 4-10)
python wfh.py kwalk --min-len 4 --max-len 10 -o walks.lst

# Multiple layouts, no shift layer
python wfh.py kwalk --layout qwerty,numpad --no-shift --max-changes 2

# List available layouts
python wfh.py kwalk --list-layouts
```

## Hashcat Rule Auto-Generation

Analyze real passwords and automatically generate hashcat-compatible `.rule` files by reverse-engineering transformation patterns.

```bash
# Generate a .rule file from password analysis
python wfh.py rulegen --wordlist leaked.txt -o rules.rule --top-rules 200

# With a dictionary for better base-word matching
python wfh.py rulegen --wordlist passwords.lst --dictionary english.txt -o optimized.rule
```

## PRINCE Attack Mode

PRINCE (PRobability INfinite Chained Elements) generates passwords by combining multiple words from a wordlist. Discovers multi-word passwords like `correcthorsebatterystaple`.

```bash
# Chain 2-4 elements from a base wordlist
python wfh.py prince --wordlist top1000.txt --min-elem 2 --max-elem 4 -o prince.lst

# With separator and case permutations
python wfh.py prince --wordlist words.txt --separator "-" --case-permute --min-len 8
```

## Wordlist Quality Benchmark

Measure the effectiveness of a generated wordlist against a reference set. Reports hit rate, efficiency, diversity, coverage by length/charset, and estimated crack times.

```bash
# Benchmark a wordlist against a known password set
python wfh.py benchmark --wordlist generated.lst --reference rockyou_sample.txt

# Save JSON report
python wfh.py benchmark --wordlist output.lst --reference test_set.txt --json report.json
```

---

## Default Credentials Database

Query the built-in database of 1,329+ factory-default credentials covering 88 vendors and 14 protocols — routers, switches, printers, IP cameras, ICS/SCADA (PLCs, HMIs, RTUs), IoT gateways, and more.

```bash
# List all supported vendors
python wfh.py default-creds --list-vendors

# Export credentials for a specific vendor
python wfh.py default-creds --vendor siemens --format combo -o siemens_creds.lst

# Filter by protocol (telnet, ssh, http, snmp, modbus, s7comm, etc.)
python wfh.py default-creds --protocol modbus --format user -o modbus_users.lst

# Search by device category
python wfh.py default-creds --category ics --format combo -o ics_defaults.lst

# Export full database as JSON
python wfh.py default-creds --export-all --format json -o all_defaults.json
```

---

## Wordlists

| File | Description | Entries |
|------|-------------|---------|
| `passwords/wlist_brasil.lst` | Brazilian password corpus — cultural word banks, corporate patterns, leet speak, keyboard walks. Company names and CNPJs are public OSINT data. | ~3.88M |
| `passwords/default-creds-combo.lst` | Default credential user:password combos (routers, printers, ICS/SCADA) | ~3K |
| `data/default_credentials.json` | Structured default credentials database (1,329 entries, 88 vendors, 14 protocols) | — |
| `fuzzing/discovery_br.lst` | Brazilian web discovery & API fuzzing paths | ~900 |
| `usernames/username_br.lst` | Brazilian + global username patterns | ~1.6K |
| `labs/*.lst` | Workshop & training wordlists | — |

> **Details:** [Wiki — Brazilian Wordlist](https://github.com/mrhenrike/WordListsForHacking/wiki/Brazilian-Wordlist)

---

## Is My Password in This List?

```bash
# Linux/macOS
grep -qxF 'YourPassword' passwords/wlist_brasil.lst && echo "FOUND!" || echo "Not found"

# Windows PowerShell
Select-String -Path passwords\wlist_brasil.lst -Pattern '^YourPassword$' -SimpleMatch -Quiet
```

If found: **change it immediately**, enable MFA/2FA, use a password manager, and never reuse passwords.

> **Full guide:** [Wiki — Password Check](https://github.com/mrhenrike/WordListsForHacking/wiki/Password-Check)

---

## ML Model

WFH includes a lightweight ML model that ranks generated candidates by structural pattern probability. Train it with local data or the SecLists corpus:

```bash
python wfh.py train --auto                    # local wordlists only
python wfh.py train --seclists                # SecLists corpus (auto-discover)
python wfh.py train --auto --seclists         # combined (recommended)
python wfh.py train --seclists /path/to/SecLists --seclists-categories password frequency
```

The model stores **only structural patterns** — no PII, passwords, or company names.

> **Details:** [Wiki — ML Model](https://github.com/mrhenrike/WordListsForHacking/wiki/ML-Model)

---

## New in v2.6 — Additional Generators

### Phrase-Initials Password Generator

Generate passwords from the first letter of each word in a phrase, with case mutations, leet substitutions, and hacker-style suffixes.

```bash
# Phrase → acrostic + variations
python wfh.py phrase "my secret corporate phrase" -o phrase.lst

# With custom prefixes and suffixes
python wfh.py phrase "my secret corporate phrase" --prefixes _,__ --suffixes @0x90,#0x90 -o phrase.lst
```

### Existing Password Mutation Engine

Generate an exhaustive set of variants from an existing base password.

```bash
# Mutate a known password
python wfh.py mutate "Summer2024" -o mutated.lst

# Control leet depth and length range
python wfh.py mutate "password123" --leet-mode aggressive --min-len 10 --max-len 25 -o mutated.lst
```

### Retail / Pharmacy Chain Credential Generator

Generates passwords and usernames following patterns common in retail environments: brand + store-id, system + tax-id, internal login prefixes.

```bash
# Both passwords and usernames for a brand
python wfh.py pharma --brand AcmePharma --ids 1200-1210 -o pharma.lst

# Passwords only, with tax ID (CNPJ)
python wfh.py pharma --brand RetailCo --abbrevs RC,RET --cnpj 01234567890123 --mode passwords

# Usernames only, custom domain
python wfh.py pharma --brand BrandX --ids 1000-2000 --domains corp.com.br --mode usernames
```

### Digit-to-Text Wordlist Generator

Converts numbers (up to 12 digits) into their text word representations with full variant generation. Supports EN, PT, BR (with feminine forms), and ES.

```bash
# Single number in English (default)
python wfh.py num2text --number 123
# → onetwothree, ONETWOTHREE, OneTwoThree, 0n37w07hr33, one-two-three, ...

# Brazilian Portuguese (includes feminine variants: uma, duas)
python wfh.py num2text --number 12 --lang br
# → umdois, umaduas, Um-Duas, um_duas, ...

# Spanish
python wfh.py num2text --number 123 --lang es
# → unodostres, UNODOSTRES, uno-dos-tres, una-dos-tres, ...

# Batch range, saved to file
python wfh.py num2text --range 0-9999 --lang en -o number_words.lst
python wfh.py num2text --range 2000-2030 --lang pt -o years_pt.lst
```

Accepted `--lang` aliases:

| Code | Also accepts | Language |
|------|-------------|----------|
| `en` | `en-us`, `en-gb` | English (default) |
| `pt` | `pt-pt` | European Portuguese |
| `br` | `pt-br` | Brazilian Portuguese |
| `es` | `es-es`, `es-mx`, `es-la` | Spanish |

### Global Length Filters

Apply minimum/maximum word length filtering to **any** subcommand output.

```bash
python wfh.py --min-len 8 --max-len 20 charset 8 12 -o filtered.lst
python wfh.py --min-len 10 mutate "admin" -o long_variants.lst
```

---

## Credits & Inspiration

| Project | Inspiration |
|---------|-------------|
| [CUPP](https://github.com/Mebus/cupp) | Personal target profiling |
| [Crunch](https://github.com/jim3ma/crunch) | Charset-based generation |
| [CeWL](https://github.com/digininja/CeWL) | Web scraping for wordlists |
| [CeWLeR](https://github.com/roys/cewler) | Modern Python web scraping (JS/CSS/PDF) |
| [routersploit](https://github.com/threat9/routersploit) | Default credentials for IoT/routers |
| [alterx](https://github.com/projectdiscovery/alterx) | DNS/subdomain fuzzing |
| [pipal](https://github.com/digininja/pipal) | Statistical analysis |
| [SecLists](https://github.com/danielmiessler/SecLists) | Curated security lists |
| [elpscrk](https://github.com/D4Vinci/elpscrk) | Permutation-based generation |
| [BEWGor](https://github.com/berzerk0/BEWGor) | Biographical wordlist generator |
| [pnwgen](https://github.com/toxydose/pnwgen) | Phone number generation |
| [intelligence-wordlist-generator](https://github.com/MichaelDim02/intelligence-wordlist-generator) | Keyword combiner |
| [SCaDAPass](https://github.com/scadastrangelove/SCaDAPass) | ICS/SCADA default credentials |
| [pcfg_cracker](https://github.com/lakiw/pcfg_cracker) | PCFG probabilistic grammar (Weir et al.) |
| [OMEN](https://github.com/RUB-SysSec/OMEN) | Ordered Markov ENumerator |
| [kwprocessor](https://github.com/hashcat/kwprocessor) | Keyboard walk generation |
| [PACK](https://github.com/iphelix/pack) | Password Analysis and Cracking Kit (rulegen) |
| [princeprocessor](https://github.com/hashcat/princeprocessor) | PRINCE attack mode |
| [MAYA](https://github.com/williamcorrias/MAYA-Password-Benchmarking) | Wordlist quality benchmarking framework |

---

## Contributing

Contributions welcome. See [CONTRIBUTING.md](CONTRIBUTING.md).

## License

[MIT License](LICENSE) — Copyright (c) 2026 André Henrique ([@mrhenrike](https://github.com/mrhenrike))

---

<p align="center">
  Created by <a href="https://github.com/mrhenrike">André Henrique (@mrhenrike)</a> — <a href="https://github.com/Uniao-Geek">União Geek</a>
</p>

<p align="center">
  <a href="README.pt-BR.md">Leia em Português</a> · <a href="https://github.com/mrhenrike/WordListsForHacking/wiki">Full Documentation (Wiki)</a>
</p>
