Metadata-Version: 2.4
Name: nl_processing
Version: 1.0.2
Summary: Aggregate build for the nl_processing multi-package repository
Requires-Python: >=3.12
Description-Content-Type: text/markdown
Requires-Dist: pydantic<3,>=2.0
Requires-Dist: langchain-core<1,>=0.3
Requires-Dist: langchain-openai<1,>=0.3
Requires-Dist: numpy<3,>=2.0
Requires-Dist: opencv-python<5,>=4.10
Requires-Dist: asyncpg<1,>=0.30
Requires-Dist: aiosqlite<1,>=0.20

# nl_processing

[![PyPI](https://img.shields.io/pypi/v/nl_processing)](https://pypi.org/project/nl_processing/)

Dutch language processing toolkit organized as a multi-package Python repository.

## Install

```bash
pip install nl_processing
```

The published `nl_processing` package is the aggregate build from the repo root. Day-to-day development happens inside the package folders under `packages/`.

## Repository Layout

```text
packages/
  core/
  extract_text_from_image/
  extract_words_from_text/
  translate_text/
  translate_word/
  database/
  database_cache/
  sampling/
docs/
pyproject.toml        # aggregate build for the published nl_processing package
Makefile              # repo-wide lint/test entrypoint
```

Each package has its own:

- `pyproject.toml`
- `ruff.toml`
- `pytest.ini`
- `tests/`
- `docs/`

## Modules

| Module | Class | Description | Docs |
|---|---|---|---|
| `core` | N/A | Shared models, ports, exceptions, and prompt helpers | [docs](packages/core/docs/module-spec.md) |
| `extract_text_from_image` | `ImageTextExtractor` | Extract Dutch text from images via Vision API | [docs](packages/extract_text_from_image/docs/module-spec.md) |
| `extract_words_from_text` | `WordExtractor` | Extract and normalize words from markdown text | [docs](packages/extract_words_from_text/docs/module-spec.md) |
| `translate_text` | `TextTranslator` | Translate text (NL -> RU) with markdown preservation | [docs](packages/translate_text/docs/module-spec.md) |
| `translate_word` | `WordTranslator` | Batch-translate words (NL -> RU) | [docs](packages/translate_word/docs/module-spec.md) |
| `database` | `DatabaseService` | Remote source of truth and default progress/sync provider | [docs](packages/database/docs/module-spec.md) |
| `database_cache` | `DatabaseCacheService` | Local-first SQLite cache with injectable remote progress sync | [docs](packages/database_cache/docs/module-spec.md) |
| `sampling` | `WordSampler` | Weighted word sampling over any compatible scored-pair provider | [docs](packages/sampling/docs/module-spec.md) |

## Development

Work inside one package when you only touch one module:

```bash
cd packages/translate_word
uv sync --all-groups
uv run pytest tests/unit
```

Run the repo-wide quality gate from the root:

```bash
make check
```

Useful package-local examples:

```bash
cd packages/core
uv run pytest tests/unit/core

cd packages/database
doppler run -- uv run pytest tests/integration/database
```

## Dependency Rule

Modules are independent packages. Cross-module dependencies must be explicit in the consuming package's `pyproject.toml`.

Shared cross-module storage contracts live in `nl_processing.core.ports`. `database` and `database_cache` are concrete implementations and adapters, not the owners of those shared interfaces.

One intentional design change in this layout: `database` no longer imports `translate_word` directly. If you want automatic translation on `add_words()`, compose it explicitly:

```python
from nl_processing.core.models import Language
from nl_processing.database.service import DatabaseService
from nl_processing.translate_word.service import WordTranslator

db = DatabaseService(
    user_id="alex",
    translator=WordTranslator(
        source_language=Language.NL,
        target_language=Language.RU,
    ),
)
```

## Docs

- Repository module spec: [docs/module-spec.md](docs/module-spec.md)
- Environment variables: [docs/ENV_VARS.md](docs/ENV_VARS.md)
- Release workflow: [docs/REALEASE_WORKFLOW.md](docs/REALEASE_WORKFLOW.md)
