Metadata-Version: 2.4
Name: pyprocessors-chunk_sentences
Version: 1.6.62
Summary: Sherpa sentence chunking processor
Author-email: Olivier Terrier <olivier.terrier@kairntech.com>
License: MIT
License-File: AUTHORS.md
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: blingfire
Requires-Dist: collections-extended
Requires-Dist: numpy
Requires-Dist: pymultirole-plugins<1.7.0,>=1.6.0
Requires-Dist: pysegmenters-blingfire<1.7.0,>=1.6.0
Requires-Dist: pysegmenters-syntok<1.7.0,>=1.6.0
Requires-Dist: tiktoken>=0.7.0
Provides-Extra: dev
Requires-Dist: bump2version; extra == 'dev'
Requires-Dist: hatchling; extra == 'dev'
Requires-Dist: pre-commit; extra == 'dev'
Provides-Extra: docs
Requires-Dist: lxml-html-clean; extra == 'docs'
Requires-Dist: m2r2; extra == 'docs'
Requires-Dist: sphinx; extra == 'docs'
Requires-Dist: sphinx-rtd-theme; extra == 'docs'
Requires-Dist: sphinxcontrib-apidoc; extra == 'docs'
Provides-Extra: sbom
Requires-Dist: cyclonedx-bom>=7.2.2; extra == 'sbom'
Requires-Dist: pip-audit>=2.10.0; extra == 'sbom'
Provides-Extra: test
Requires-Dist: dirty-equals; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: ruff; extra == 'test'
Description-Content-Type: text/markdown

# pyprocessors_chunk_sentences

[![license](https://img.shields.io/github/license/oterrier/pyprocessors_chunk_sentences)](https://github.com/oterrier/pyprocessors_chunk_sentences/blob/master/LICENSE)
[![tests](https://github.com/oterrier/pyprocessors_chunk_sentences/workflows/tests/badge.svg)](https://github.com/oterrier/pyprocessors_chunk_sentences/actions?query=workflow%3Atests)
[![codecov](https://img.shields.io/codecov/c/github/oterrier/pyprocessors_chunk_sentences)](https://codecov.io/gh/oterrier/pyprocessors_chunk_sentences)
[![docs](https://img.shields.io/readthedocs/pyprocessors_chunk_sentences)](https://pyprocessors_chunk_sentences.readthedocs.io)
[![version](https://img.shields.io/pypi/v/pyprocessors_chunk_sentences)](https://pypi.org/project/pyprocessors_chunk_sentences/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pyprocessors_chunk_sentences)](https://pypi.org/project/pyprocessors_chunk_sentences/)

Create segments from annotations

## Installation

You can simply `pip install pyprocessors_chunk_sentences`.

## Developing

### Pre-requisites

You will need to install [uv](https://docs.astral.sh/uv/getting-started/installation/) for dependency management and building:

```
pip install uv
```

Clone the repository:

```
git clone https://github.com/oterrier/pyprocessors_chunk_sentences
```

Install the project with test dependencies:

```
uv sync --extra test
```

### Running the test suite

You can run the full test suite with:

```
uv run pytest
```

### Linting and formatting

This project uses [ruff](https://docs.astral.sh/ruff/) for linting and formatting:

```
uv run ruff check .
uv run ruff format --check .
```

### Building the documentation

You can build the HTML documentation with:

```
uv sync --extra docs
uv run sphinx-build docs docs/_build
```

The built documentation is available at `docs/_build/index.html`.

### SBOM & vulnerability check

Install the SBOM dependencies:

```
uv sync --extra sbom
```

Generate a CycloneDX SBOM from the current environment:

```
uv run cyclonedx-py environment -o sbom.cdx.json --output-format json
```

Audit dependencies for known vulnerabilities:

```
uv run pip-audit --format json --output audit-report.json
```

To fail on any known vulnerability (useful in CI):

```
uv run pip-audit --strict
```
