Metadata-Version: 2.4
Name: langidentify-full-model
Version: 1.0.0
Summary: High-accuracy full model data for the LangIdentify language detection library. Install via pip install langidentify[full].
Author-email: Jeremy Lilley <jeremy@jlilley.net>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/jlpka/langidentify
Project-URL: Repository, https://github.com/jlpka/langidentify
Keywords: language-detection,nlp,model-data
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# LangIdentify Full Model

Full model data for the [LangIdentify](https://pypi.org/project/langidentify/)
language detection library. This package contains only model data files — no
code.

## Installation

```bash
pip install "langidentify[full]"
```

This installs both `langidentify` and this package. Once installed,
`Model.load()` will automatically prefer the full model.

## Lite vs. full model

Both models are trained from the same Wikipedia data but cropped at different
probability floors:

| | Lite | Full |
|---|---|---|
| Log-probability floor | -12 | -15 |
| Disk size (all languages) | ~17 MB | ~89 MB |
| Best for | Most use cases | Maximum accuracy when memory is not a concern |

## License

Apache License 2.0 — see [LICENSE](LICENSE).

The bundled models contain statistical parameters derived from Wikipedia text.
The models do not contain or reproduce Wikipedia text.
