Metadata-Version: 2.4
Name: charstreamer
Version: 0.1.0
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Text Processing :: Linguistic
Summary: Fast Rust/PyO3 semantic text segmentation
Home-Page: https://github.com/mjbommar/charstreamer
Author: Michael Bommarito
License-Expression: MIT OR Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://github.com/mjbommar/charstreamer/tree/master/docs
Project-URL: Homepage, https://github.com/mjbommar/charstreamer
Project-URL: Issues, https://github.com/mjbommar/charstreamer/issues
Project-URL: Repository, https://github.com/mjbommar/charstreamer

# CharStreamer Python

`charstreamer` provides Python access to the Rust CharStreamer segmentation
engine through a PyO3 extension module.

This first public wheel focuses on fast semantic text segmentation:

- paragraphs
- sentences
- metadata-like lines
- headings/sections
- list items
- dialogue spans

## Install

```bash
pip install charstreamer
```

## Example

```python
import charstreamer

text = """# Background
The court reviewed the invoice. The shipment was late.

- Notice was timely.
- Damages were limited.
"""

segmenter = charstreamer.Segmenter.default()
annotation = segmenter.annotate(text)

print(annotation["spans"])
print(annotation["tagged"])
```

The project is an early development release. APIs may change before a stable
`1.0` release.

Full documentation and Rust source are available at:

https://github.com/mjbommar/charstreamer

