Metadata-Version: 2.3
Name: stringpod
Version: 0.2.2
Summary: Matching texts across languages.
License: MIT
Author: Jasmine Yeung
Author-email: yeungjyy@gmail.com
Requires-Python: >=3.12.3,<4.0
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.12
Provides-Extra: dev
Provides-Extra: docs
Provides-Extra: test
Requires-Dist: black (>=25.1.0,<26.0.0) ; extra == "test"
Requires-Dist: bump2version (>=1.0.1,<2.0.0) ; extra == "dev"
Requires-Dist: flake8 (>=7.1.2,<8.0.0) ; extra == "test"
Requires-Dist: flake8-docstrings (>=1.6.0,<2.0.0) ; extra == "test"
Requires-Dist: hanzidentifier (>=1.3.0,<2.0.0)
Requires-Dist: isort (>=6.0.0,<7.0.0) ; extra == "test"
Requires-Dist: jieba (>=0.42.1,<0.43.0)
Requires-Dist: langdetect (>=1.0.9,<2.0.0)
Requires-Dist: mkdocs (>=1.6.1,<2.0.0) ; extra == "docs"
Requires-Dist: mkdocs-autorefs (>=1.3.1,<2.0.0) ; extra == "docs"
Requires-Dist: mkdocs-include-markdown-plugin (>=1.0.0,<2.0.0) ; extra == "docs"
Requires-Dist: mkdocs-material (>=9.6.5,<10.0.0) ; extra == "docs"
Requires-Dist: mkdocstrings (>=0.28.1,<0.29.0) ; extra == "docs"
Requires-Dist: mkdocstrings-python (>=1.16.1,<2.0.0) ; extra == "docs"
Requires-Dist: mypy (>=1.15.0,<2.0.0) ; extra == "test"
Requires-Dist: number-parser (>=0.3.2,<0.4.0)
Requires-Dist: opencc (>=1.1.9,<2.0.0)
Requires-Dist: pip (>=25.0.1,<26.0.0) ; extra == "dev"
Requires-Dist: pre-commit (>=4.1.0,<5.0.0) ; extra == "dev"
Requires-Dist: pypinyin (>=0.53.0,<0.54.0)
Requires-Dist: pytest (>=8.3.4,<9.0.0) ; extra == "test"
Requires-Dist: pytest-cov (>=6.0.0,<7.0.0) ; extra == "test"
Requires-Dist: python-semantic-release (>=9.20.0,<10.0.0) ; extra == "dev"
Requires-Dist: toml (>=0.10.2,<0.11.0) ; extra == "dev"
Requires-Dist: tox (>=4.24.1,<5.0.0) ; extra == "dev"
Requires-Dist: twine (>=6.1.0,<7.0.0) ; extra == "dev"
Requires-Dist: virtualenv (>=20.29.2,<21.0.0) ; extra == "dev"
Project-URL: Homepage, https://github.com/jyyyeung/stringpod
Description-Content-Type: text/markdown

# String Pod

[![pypi](https://img.shields.io/pypi/v/stringpod.svg)](https://pypi.org/project/stringpod/)
[![python](https://img.shields.io/pypi/pyversions/stringpod.svg)](https://pypi.org/project/stringpod/)
[![Build Status](https://github.com/jyyyeung/stringpod/actions/workflows/dev.yml/badge.svg)](https://github.com/jyyyeung/stringpod/actions/workflows/dev.yml)
[![codecov](https://codecov.io/gh/jyyyeung/stringpod/branch/main/graphs/badge.svg)](https://codecov.io/github/jyyyeung/stringpod)

Matching texts across languages

* Documentation: <https://jyyyeung.github.io/stringpod>
* GitHub: <https://github.com/jyyyeung/stringpod>
* PyPI: <https://pypi.org/project/stringpod/>
* Free software: MIT

## Features

* Normalize text with options
* Check if a text contains a substring
* Parse numbers from text
* Compare pinyin of two texts

## Usage

### Contains

Check if a text contains a substring, with options.

```bash
stringpod contains "Hello, world!" "world"
stringpod contains "  Hello, world!  " "lo, wor" --options "strip_whitespace,ignore_case"
stringpod contains "歌曲（純音樂）" "(纯音乐)" --options "ignore_chinese_variant"
```

### Normalize

Normalize text to a standard form.

```bash
stringpod normalize "Hello, World!!!"
stringpod normalize "    Hello,   World!!!" --options "all"
stringpod normalize "歌曲（純音樂）" --options "ignore_chinese_variant"
```

### Normalizer Options

* `strip_whitespace`: Strip whitespace (leading and trailing) from the text (default: `False`)
* `remove_whitespace`: Remove whitespace (all whitespace characters) from the text (default: `False`)
  * `strip_whitespace` will not be needed if `remove_whitespace` is `True`
* `ignore_chinese_variant`: Ignore Chinese variant (default: `False`)
* `ignore_case`: Ignore case (default: `False`)
  * English will be converted to lowercase
  * Chinese will be converted to simplified Chinese
* `nfkc`: Normalize to NFKC (default: `True`)

### Number Parser

Parse numbers from text.

```bash
stringpod number "One hundred and twenty-three"
stringpod number "One hundred and twenty-three" --language "en"
```

### Number Parser Options

* `language`: Language of the number (default: `en`)

### Compare Pinyin

Compare pinyin of two texts.

```bash
stringpod cmp-pinyin "你好" "你号"
stringpod cmp-pinyin "你好" "你号" --options "with_tone"
stringpod cmp-pinyin "你好" "你号" --options "spoken_tone"
```

### Pinyin Options

* `with_tone`: Whether to include the tone (default: `False`)
* `spoken_tone`: Whether to use the spoken tone (default: `False`)

## Development

```bash
poetry install -E dev -E docs -E test
poetry run pre-commit install
```

### CLI Application

```bash
poetry run python -m stringpod.cli --help
```

### Python API

```bash
poetry run python -m stringpod.stringpod --help
```

### Testing

```bash
poetry run pytest # Run Pytest
poetry run python -m stringpod.stringpod -v # Run Doctests
```

## Credits

Core packages:

* [number-parser](https://github.com/scrapinghub/number-parser)
* [pypinyin](https://github.com/mozillazg/python-pinyin)
* [opencc](https://github.com/BYVoid/OpenCC)
* [jieba](https://github.com/fxsjy/jieba)

This package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and the [waynerv/cookiecutter-pypackage](https://github.com/waynerv/cookiecutter-pypackage) project template.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

