Metadata-Version: 2.4
Name: tact-dictionary
Version: 0.2.0
Summary: Build, validate, and query .tactdict language packs
Author: Tact contributors
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.12
Requires-Dist: pyyaml>=6.0
Description-Content-Type: text/markdown

# Tact Dictionary

Tact Dictionary builds, validates, and queries `.tactdict` language packs for
text-entry software.

The `tact-dictionary` tool and related Python package can prepare source data,
build binary packs, validate pack files, and inspect completions and variants.
The project was built with the Tact Android keyboard in mind, but the keyboard
is an adopter of the format rather than the only intended consumer. Current
packs contain a normalized lexicon graph, word records, prefix suggestion
caches, and variant groups for long-press word alternatives.

The tool and Python package are licensed under the MIT License. Generated
language packs are data artifacts with source-derived licensing and attribution
recorded in the sidecar manifest. The code license does not relicense frequency
lists, morphology data, blocklists, or other input datasets.

## Status

This project is pre-alpha. The tool can produce and validate an English pack,
and the Python API can inspect completions and variants.

Implemented pack sections:

- `STRINGS`: shared UTF-8 string pool
- `SYMBOLS`: normalized input symbols
- `WORDS`: word records and source masks
- `VARIANTS`: variant groups such as `run`, `runs`, `running`, `runner`
- `SUGGESTS`: prefix suggestion lists attached to graph states
- `GRAPH`: lexicon word graph

Not implemented yet: context models, full typo-correction graph data, blocklist
sections, user overlays, and runtime memory-map readers.

## Development

```bash
uv sync
uv run tact-dictionary --version
uv run ruff format --check src tests
uv run ruff check
uv run pytest
```

Build Python distribution artifacts with:

```bash
uv build
```

## Build A Pack

Download the current English source files:

- Leipzig Corpora Collection English news corpus:
  <https://downloads.wortschatz-leipzig.de/corpora/eng_news_2025_100K.tar.gz>
- MorphyNet English inflection data:
  <https://raw.githubusercontent.com/kbatsuren/MorphyNet/main/eng/eng.inflectional.v1.tsv>
- MorphyNet English derivation data:
  <https://raw.githubusercontent.com/kbatsuren/MorphyNet/main/eng/eng.derivational.v1.tsv>

Convert external source files into tact-dictionary build inputs:

```bash
uv run tact-dictionary import-language-data \
  --output-dir build/en_US \
  --leipzig-corpus sources/eng_news_2025_100K.tar.gz \
  --morphynet-inflection sources/eng.inflectional.v1.tsv \
  --morphynet-derivation sources/eng.derivational.v1.tsv
```

Build and validate the pack:

```bash
uv run tact-dictionary build --config build/en_US/en_US.tactbuild
uv run tact-dictionary validate build/en_US/generated/en_US.tactdict
```

The build writes:

- `build/en_US/generated/en_US.tactdict`
- `build/en_US/generated/en_US.manifest.json`
- `build/en_US/generated/en_US.symbols.json`

See [Building Language Packs](docs/building-language-packs.md) for input file
formats and build configuration details.

## Query A Pack

Python consumers can inspect a built pack without shelling out:

```python
from tact_dictionary.query import DictionaryLookup

lookup = DictionaryLookup.from_file("en_US.tactdict")
for completion in lookup.find_completions("ru"):
    print(completion.surface, completion.kind, completion.completion_cost)

for variant in lookup.find_variants("run"):
    print(variant.surface, variant.relation_to_primary)
```

The same lookup is available from the CLI:

```bash
tact-dictionary --file en_US.tactdict find-completions --limit 12 ru
tact-dictionary --file en_US.tactdict find-variants run
```

## Documentation

- [Building Language Packs](docs/building-language-packs.md)
- [Pack Format](docs/pack-format.md)
- [Release Checklist](docs/releasing.md)
