Metadata-Version: 2.4
Name: gngram-lookup
Version: 1.2.1
Summary: Static Hash-Based Lookup for Google Ngram Frequencies
License: Proprietary
License-File: LICENSE
Keywords: ngram,google-ngram,nlp,natural-language-processing,frequency,linguistics
Author: Craig Trim
Author-email: craigtrim@gmail.com
Maintainer: Craig Trim
Maintainer-email: craigtrim@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Requires-Dist: polars (>=1.0,<2.0)
Requires-Dist: pyarrow (>=18.0,<19.0)
Project-URL: Repository, https://github.com/craigtrim/gngram-lookup
Description-Content-Type: text/markdown

# gngram-lookup

[![PyPI version](https://badge.fury.io/py/gngram-lookup.svg)](https://badge.fury.io/py/gngram-lookup)
[![Downloads](https://pepy.tech/badge/gngram-lookup)](https://pepy.tech/project/gngram-lookup)
[![Downloads/Month](https://pepy.tech/badge/gngram-lookup/month)](https://pepy.tech/project/gngram-lookup)
[![Tests](https://img.shields.io/badge/tests-194-brightgreen)](https://github.com/craigtrim/gngram-lookup/tree/main/tests)
[![Python 3.9+](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org/downloads/)

Word frequency and part-of-speech tags from 500 years of books. O(1) lookup. 5 million words.

## Install

```bash
pip install gngram-lookup
python -m gngram_lookup.download_data       # frequency data, ~110 MB
python -m gngram_lookup.download_pos_data   # POS tag data, separate download
```

## Python

```python
import gngram_lookup as ng

ng.exists('computer')       # True
ng.exists('xyznotaword')    # False

ng.frequency('computer')
# {'peak_tf': 2000, 'peak_df': 2000, 'sum_tf': 892451, 'sum_df': 312876}

ng.batch_frequency(['the', 'algorithm', 'xyznotaword'])
# {'the': {...}, 'algorithm': {...}, 'xyznotaword': None}

ng.word_score('the')                     # 1  (most common)
ng.word_score('computer')               # 18
ng.word_score('rucksack')               # 58
ng.word_score('xyznotaword')            # None

ng.pos('fast')                           # ['ADJ', 'ADV', 'VERB']
ng.pos('corn', min_tf=100000)            # ['ADJ', 'NOUN']
ng.pos_freq('corn')                      # {'NOUN': 11722803, 'ADJ': 1433642, ...}
ng.has_pos('sing', ng.PosTag.VERB)       # True
ng.has_pos('sing', ng.PosTag.VERB, min_tf=1000)  # True
```

## CLI

```bash
exists computer       # True, exit 0
exists xyznotaword    # False, exit 1

freq computer
# peak_tf_decade: 2000
# peak_df_decade: 2000
# sum_tf: 892451
# sum_df: 312876

score computer        # 18
pos fast              # ADJ ADV VERB
pos-freq corn         # ADJ: 1,433,642 / NOUN: 11,722,803 / VERB: 85,411
has-pos sing VERB     # True, exit 0
has-pos fast NOUN     # False, exit 1
```

## Docs

- [API Reference](https://github.com/craigtrim/gngram-lookup/blob/main/docs/api.md)
- [CLI Reference](https://github.com/craigtrim/gngram-lookup/blob/main/docs/cli.md)
- [Data Format](https://github.com/craigtrim/gngram-lookup/blob/main/docs/data-format.md)
- [Use Cases](https://github.com/craigtrim/gngram-lookup/blob/main/docs/use-cases.md)
- [Development](https://github.com/craigtrim/gngram-lookup/blob/main/docs/development.md)

## See Also

- [bnc-lookup](https://pypi.org/project/bnc-lookup/) - O(1) lookup for British National Corpus
- [wordnet-lookup](https://pypi.org/project/wordnet-lookup/) - O(1) lookup for WordNet

## Attribution

Data derived from the [Google Books Ngram](https://books.google.com/ngrams) dataset.

## License

Proprietary. See [LICENSE](https://github.com/craigtrim/gngram-lookup/blob/main/LICENSE).

