Metadata-Version: 2.4
Name: host-replace
Version: 0.2.0
Summary: Replace host and domain names in text under various encoding schemes.
Project-URL: Homepage, https://github.com/adamreiser/host_replace
Project-URL: Bug Tracker, https://github.com/adamreiser/host_replace/issues
Author-email: Adam Reiser <reiser@defensivecomputing.io>
License-Expression: MIT
License-File: LICENSE.txt
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Filters
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: idna
Requires-Dist: regex
Description-Content-Type: text/markdown

# Host Replace

A Python package for replacing hostnames, domains, and IP addresses in text under common encoding schemes.

## Features

- Replace hostnames and IP addresses in text under common encodings (URL, HTML entity) while avoiding partial matches.
- Replacements maintain the same encoding as the original text.
- Provides CLI interface and importable module.
- Supports UTF-8 string and byte inputs.
- Supports FQDNS, second level domains, unqualified hostnames, and IPv4/IPv6 addresses.

## Installation

Requires Python 3.11 or newer.

Install with pip: `pip install host-replace`

Install from source:
```
git clone https://github.com/adamreiser/host_replace
cd host_replace
pip install .
```

## Usage

### Command-line interface

Transform the following text file using the provided mapping: `host-replace -m mappings.json sample.txt --verbose`

Engine selection is configurable with `--engine regex|automaton|auto` (default `regex`).
Use `--expected-runs` to hint reuse count when `--engine auto` is selected.

Guidance:
- Use `regex` for one-shot or small inputs (lowest startup overhead).
- Use `automaton` for larger workloads or when reusing the same replacer repeatedly.
- Use `auto` if you want a heuristic choice based on host-map size, input size, and expected reuse.
- Current `auto` heuristic is benchmark-informed and intentionally conservative:
  - one-shot (`expected_runs=1`) chooses `regex`
  - `automaton` is considered for larger maps and inputs when reuse is >=2

```sample.txt
1. https://web.example.com/path/to/resource?query=param
2. <a href="https&#x3a;&#x2f;&#x2f;boards&#x2e;example&#x2e;com&#x2f;thread&#x2f;123">Discussion Board</a>
3. Redirecting to https%3A%2F%2Fen.us.wiki.example.com%2Fwelcome
4. https://web-1a.example.com/redirect?q=%65%6e%2e%75%73%2e%77%69%6b%69%2e%65%78%61%6d%70%6c%65%2e%63%6f%6d
5. <meta http-equiv="refresh" content="0; url=https%3A%2F%2Fweb.example.com%2Fhome">
6. Our domain is still example.com and archived wiki will remain at archive.en.us.wiki.example.com.
```

```mappings.json
{
    "web.example.com": "www.example.com",
    "web-1a.example.com": "www-1a.example.com",
    "boards.example.com": "forums.en.us.example.com",
    "en.us.wiki.example.com": "wiki.example.com",
    "us.example.com": "us-east-1.example.net",
    "example.net": "example.org",
    "images.example.com": "cdn.example.org"
}
```

Output:
```
INFO: Replacing web.example.com with www.example.com at offset 11
INFO: Replacing boards&#x2e;example&#x2e;com with forums&#x2e;en&#x2e;us&#x2e;example&#x2e;com at offset 91
INFO: Replacing en.us.wiki.example.com with wiki.example.com at offset 195
INFO: Replacing web-1a.example.com with www-1a.example.com at offset 239
INFO: Replacing %65%6e%2e%75%73%2e%77%69%6b%69%2e%65%78%61%6d%70%6c%65%2e%63%6f%6d with %77%69%6b%69%2e%65%78%61%6d%70%6c%65%2e%63%6f%6d at offset 269
INFO: Replacing web.example.com with www.example.com at offset 396
1. https://www.example.com/path/to/resource?query=param
2. <a href="https&#x3a;&#x2f;&#x2f;forums&#x2e;en&#x2e;us&#x2e;example&#x2e;com&#x2f;thread&#x2f;123">Discussion Board</a>
3. Redirecting to https%3A%2F%2Fwiki.example.com%2Fwelcome
4. https://www-1a.example.com/redirect?q=%77%69%6b%69%2e%65%78%61%6d%70%6c%65%2e%63%6f%6d
5. <meta http-equiv="refresh" content="0; url=https%3A%2F%2Fwww.example.com%2Fhome">
6. Our domain is still example.com and archived wiki will remain at archive.en.us.wiki.example.com.
```

### API

To use the module in your Python application:

```python3
import host_replace

host_map = {
    "web.example.com": "www.example.com",
    "boards.example.com": "forums.example.net"
}

replacer = host_replace.HostnameReplacer(host_map, engine="auto", expected_runs=2)

# Input text (str or bytes)
input_text = "Visit us at https://web.example.com or leave a comment at https://boards.example.com."

# Apply replacements
output_text = replacer.apply_replacements(input_text)

# Output: Visit us at https://www.example.com or leave a comment at https://forums.example.net.
print(output_text)
```

## Limitations

- Full pre-encoding case preservation is not supported. Matching is case-insensitive for encoded and unencoded hosts, but replacements preserve case based on the matched representation after encoding, which can lead to cosmetic casing differences in some encoded forms.

- Full case preservation of individual characters is not supported due to its inherent ambiguity. For example, when mapping `WWW.example.com` to `example.org`, it's unclear which if any letters should be capitalized.

- Variations in encoding representation (e.g., "%2F" vs "%2f"; "&#x2f" vs "&#X2f") can lead to inconsistently cased outputs.

- Does not process binary data beyond exact byte sequence matching. Encodings like base64 are not supported.

- Hostnames starting with hex codes can be ambiguous when preceded by %. For instance, `%00example.com` could be interpreted as `example.com` or `00example.com`.

- Support for Internationalized Domain Names (IDNs) has not been thoroughly tested.
