Metadata-Version: 2.4
Name: py-text-toolkit
Version: 0.1.0
Summary: A comprehensive string utility library for Python.
Author-email: Dawood Afzal <dawoodafzal.62138@gmail.com>
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: emoji>=2.0.0
Dynamic: license-file

# py-text-toolkit

A lightweight, dependency-minimal Python library for everyday string operations — cleaning, validation, analysis, case conversion, and generation.

---

## Installation

```bash
pip install py-text-toolkit
```

> **Requires:** Python 3.8+  
> **Optional dependency:** `emoji` (required only for `cleaning.remove_emojis`)

---

## Modules at a Glance

| Module | What it does |
|---|---|
| `py-text-toolkit.cleaning` | Strip, replace, and normalize raw text |
| `py-text-toolkit.validation` | Validate emails, URLs, passwords, and character sets |
| `py-text-toolkit.analysis` | Count, compare, and measure strings |
| `py-text-toolkit.format_cases` | Convert between naming conventions and formatting styles |
| `py-text-toolkit.generation` | Generate slugs, masks, ciphers, and reversed strings |

---

## Quick Start

```python
from py-text-toolkit.cleaning import remove_html_tags, remove_urls
from py-text-toolkit.validation import is_email, is_strong_password
from py-text-toolkit.analysis import word_count, is_palindrome
from py-text-toolkit.format_cases import to_snake_case, to_camel_case
from py-text-toolkit.generation import generate_slug, mask_range

# Clean
remove_html_tags("<p>Hello <b>world</b></p>")   # "Hello world"
remove_urls("Visit https://example.com today")  # "Visit today"

# Validate
is_email("user@example.com")        # True
is_strong_password("Passw0rd!")     # True

# Analyse
word_count("Hello, world!")         # 2
is_palindrome("A man a plan a canal Panama")  # True

# Convert case
to_snake_case("camelCaseText")      # "camel_case_text"
to_camel_case("hello_world")        # "helloWorld"

# Generate
generate_slug("Hello World!")       # "hello-world"
mask_range("1234-5678-9012", 5, 9, "*")  # "1234-****-9012"
```

---

## Module Reference

### `py-text-toolkit.cleaning`

Functions for sanitising and normalising raw text.

| Function | Signature | Description |
|---|---|---|
| `normalize_whitespace` | `(text) → str` | Collapse all whitespace runs to a single space and strip ends |
| `remove_punctuation` | `(text, replace="") → str` | Remove or replace all punctuation characters |
| `remove_digits` | `(text, replace="") → str` | Remove or replace all digit characters |
| `remove_html_tags` | `(text, replace="") → str` | Strip or replace HTML tags |
| `remove_urls` | `(text, replace="") → str` | Remove or replace HTTP/HTTPS and `www.` URLs |
| `remove_emojis` | `(text, replace="") → str` | Remove or replace emoji characters (requires `emoji`) |
| `collapse_spaces` | `(text) → str` | Remove **all** whitespace (not just collapse) |

All cleaning functions accept an optional `replace` argument — the string substituted in place of each removed element (defaults to `""`). After replacement, whitespace is always normalized.

```python
from py-text-toolkit.cleaning import remove_punctuation, remove_html_tags, remove_emojis

remove_punctuation("Hello, world!")              # "Hello world"
remove_punctuation("Hello, world!", replace=" ") # "Hello world"

remove_html_tags("<p>Hello <b>world</b></p>")    # "Hello world"
remove_html_tags("<br/>line1<br/>line2", replace=" ")  # "line1 line2"

remove_emojis("Great job! 🎉")                   # "Great job!"
remove_emojis("Hello 😊", replace="[emoji]")     # "Hello [emoji]"
```

---

### `py-text-toolkit.validation`

Boolean predicates for common string formats.

| Function | Signature | Description |
|---|---|---|
| `is_email` | `(text) → bool` | Check for a valid email address |
| `is_url` | `(text) → bool` | Check for a valid HTTP or HTTPS URL |
| `contains_only` | `(text, allowed_chars) → bool` | Check that every character is in the allowed set |
| `is_strong_password` | `(text) → bool` | Check that a password meets strength requirements |

**Password requirements** (`is_strong_password`):
- Minimum 8 characters
- At least one lowercase letter
- At least one uppercase letter
- At least one digit
- At least one special character from `@$!%*?&`

```python
from py-text-toolkit.validation import is_email, is_url, contains_only, is_strong_password

is_email("user@example.com")          # True
is_email("not-an-email")              # False

is_url("https://api.service.io/v1")   # True
is_url("ftp://files.example.com")     # False

contains_only("12345", "0123456789")  # True
contains_only("hello!", "a-z")        # False  (literal chars only, not a range)

is_strong_password("Passw0rd!")       # True
is_strong_password("weakpass")        # False
```

> **Note on `contains_only`:** `allowed_chars` is treated as a set of literal characters. Special regex characters are escaped automatically, so `"a-z"` matches only the three characters `a`, `-`, and `z`, **not** a range.

---

### `py-text-toolkit.analysis`

Functions that measure and compare strings.

| Function | Signature | Description |
|---|---|---|
| `word_count` | `(text) → int` | Count words using regex word-boundary matching |
| `char_frequency` | `(text, char) → int` | Count non-overlapping occurrences of a character or substring |
| `count_vowels` | `(text) → int` | Count English vowels (a e i o u), case-insensitive |
| `longest_word` | `(text) → int` | Return the length of the longest whitespace-delimited word |
| `is_palindrome` | `(text, case_sensitive=False, ignore_formatting=True) → bool` | Check if a string is a palindrome |
| `is_anagram` | `(word1, word2) → bool` | Check if two strings are anagrams (case-insensitive, ignores spaces) |

```python
from py-text-toolkit.analysis import word_count, is_palindrome, is_anagram, char_frequency

word_count("Hello, world!")                   # 2
word_count("  spaces   everywhere  ")         # 2

char_frequency("banana", "an")                # 2

is_palindrome("racecar")                      # True
is_palindrome("A man a plan a canal Panama")  # True
is_palindrome("Racecar", case_sensitive=True) # False

is_anagram("listen", "silent")                # True
is_anagram("Astronomer", "Moon starer")       # True
```

---

### `py-text-toolkit.format_cases`

Convert strings between naming conventions and apply text formatting.

| Function | Signature | Description |
|---|---|---|
| `to_snake_case` | `(text) → str` | Convert to `snake_case` |
| `to_camel_case` | `(text) → str` | Convert to `camelCase` |
| `to_pascal_case` | `(text) → str` | Convert to `PascalCase` |
| `to_kebab_case` | `(text) → str` | Convert to `kebab-case` |
| `to_title_case` | `(text) → str` | Convert to `Title Case` |
| `truncate` | `(text, max_length, suffix="...") → str` | Truncate to a maximum length with a suffix |
| `pad_center` | `(text, width, fillchar=" ") → str` | Center-pad to a given width |

All case converters handle mixed input (camelCase, PascalCase, snake_case, kebab-case, spaces).

```python
from py-text-toolkit.format_cases import to_snake_case, to_camel_case, truncate, pad_center

to_snake_case("camelCaseText")    # "camel_case_text"
to_snake_case("Hello World!")     # "hello_world"

to_camel_case("hello_world")      # "helloWorld"
to_camel_case("PascalCaseText")   # "pascalCaseText"

to_pascal_case("kebab-case-text") # "KebabCaseText"
to_kebab_case("camelCaseText")    # "camel-case-text"
to_title_case("hello_world")      # "Hello World"

truncate("Hello, World!", 8)      # "Hello..."
truncate("Hi", 10)                # "Hi"

pad_center("hello", 11)           # "   hello   "
pad_center("hi", 10, "-")         # "----hi----"
```

---

### `py-text-toolkit.generation`

Functions that produce new strings from existing ones.

| Function | Signature | Description |
|---|---|---|
| `generate_slug` | `(text) → str` | Convert to a URL-friendly slug |
| `reverse_word` | `(text) → str` | Reverse all characters |
| `mask_range` | `(text, start_index, end_index, placeholder="X") → str` | Mask a character range with a placeholder |
| `ceasar_cipher` | `(text, shift) → str` | Encrypt/decrypt with the Caesar cipher |

```python
from py-text-toolkit.generation import generate_slug, mask_range, ceasar_cipher, reverse_word

generate_slug("Hello World!")               # "hello-world"
generate_slug("Python 3.11 -- Release Notes")  # "python-3-11-release-notes"

reverse_word("hello")                       # "olleh"

mask_range("1234-5678-9012", 5, 9, "*")     # "1234-****-9012"
mask_range("secret", -3, -1)               # "secXXt"

ceasar_cipher("Hello, World!", 3)           # "Khoor, Zruog!"
ceasar_cipher("Khoor, Zruog!", -3)          # "Hello, World!"  (decrypt)
```

---

## Dependencies

| Package | Required | Used by |
|---|---|---|
| `re` (stdlib) | Always | All modules |
| `string` (stdlib) | Always | `cleaning` |
| `emoji` | Optional | `cleaning.remove_emojis` only |

Install with the optional dependency:

```bash
pip install py-text-toolkit[emoji]
```

---

## License

MIT License — see [LICENSE](LICENSE) for details.
