Metadata-Version: 2.1
Name: gngram-lookup
Version: 0.2.0
Summary: Static Hash-Based Lookup for Google Ngram Frequencies
Home-page: https://github.com/craigtrim/gngram-lookup
License: Proprietary
Keywords: ngram,google-ngram,nlp,natural-language-processing,frequency,linguistics
Author: Craig Trim
Author-email: craigtrim@gmail.com
Maintainer: Craig Trim
Maintainer-email: craigtrim@gmail.com
Requires-Python: >=3.11,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Requires-Dist: polars (>=1.0,<2.0)
Requires-Dist: pyarrow (>=18.0,<19.0)
Project-URL: Repository, https://github.com/craigtrim/gngram-lookup
Description-Content-Type: text/markdown

# gngram-lookup

[![PyPI version](https://badge.fury.io/py/gngram-lookup.svg)](https://badge.fury.io/py/gngram-lookup)
[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)

Word frequency from 500 years of books. O(1) lookup. 5 million words.

## Install

```bash
pip install gngram-lookup
python -m gngram_lookup.download_data
```

## Python

```python
import gngram_lookup as ng

ng.exists('computer')       # True
ng.exists('xyznotaword')    # False

ng.frequency('computer')
# {'peak_tf': 2000, 'peak_df': 2000, 'sum_tf': 892451, 'sum_df': 312876}

ng.batch_frequency(['the', 'algorithm', 'xyznotaword'])
# {'the': {...}, 'algorithm': {...}, 'xyznotaword': None}
```

## CLI

```bash
gngram-exists computer    # True, exit 0
gngram-exists xyznotaword # False, exit 1

gngram-freq computer
# peak_tf_decade: 2000
# peak_df_decade: 2000
# sum_tf: 892451
# sum_df: 312876
```

## Docs

- [API Reference](docs/api.md)
- [CLI Reference](docs/cli.md)
- [Data Format](docs/data-format.md)
- [Use Cases](docs/use-cases.md)
- [Development](docs/development.md)

## Attribution

Data derived from the [Google Books Ngram](https://books.google.com/ngrams) dataset.

## License

Proprietary. See [LICENSE](LICENSE).

