Metadata-Version: 2.4
Name: genlm-grammar
Version: 0.2.0
Author: The GenLM Team
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: frozendict
Requires-Dist: interegular
Requires-Dist: lark
Requires-Dist: nltk
Requires-Dist: numpy
Requires-Dist: arsenal
Requires-Dist: IPython
Requires-Dist: graphviz
Requires-Dist: pandas
Provides-Extra: test
Requires-Dist: coverage; extra == "test"
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-benchmark; extra == "test"
Provides-Extra: docs
Requires-Dist: mkdocs; extra == "docs"
Requires-Dist: mkdocstrings[python]; extra == "docs"
Requires-Dist: mkdocs-material; extra == "docs"
Requires-Dist: mkdocs-gen-files; extra == "docs"
Requires-Dist: mkdocs-literate-nav; extra == "docs"
Requires-Dist: mkdocs-section-index; extra == "docs"
Dynamic: license-file


![Logo](logo.png)


[![Docs](https://github.com/genlm/genlm-grammar/actions/workflows/docs.yml/badge.svg)](https://genlm.github.io/genlm-grammar/)
[![Tests](https://github.com/genlm/genlm-grammar/actions/workflows/pytest.yml/badge.svg)](https://genlm.github.io/genlm-grammar/actions/workflows/pytest.yml)
[![codecov](https://codecov.io/github/genlm/genlm-grammar/graph/badge.svg?token=rBj8AYdpe4)](https://codecov.io/github/genlm/genlm-grammar)
[![PyPI](https://img.shields.io/pypi/v/genlm-grammar.svg)](https://pypi.org/project/genlm-grammar/)


A Python library for working with weighted context-free grammars (WCFGs), weighted finite state automata (WFSAs) and weighted finite state transducers (WFSTs). The library provides efficient implementations for grammar operations, parsing algorithms, and language model functionality.

## Quick Start

This library can be installed via pip:

```bash
pip install genlm-grammar
```

## Key Features

### Grammar Operations
- Support for weighted context-free grammars with various semirings (Boolean, Float, Real, MaxPlus, MaxTimes, etc.)
- Grammar transformations:
  - Local normalization
  - Removal of nullary rules and unary cycles
  - Grammar binarization
  - Length truncation
  - Renaming/renumbering of nonterminals

### Parsing Algorithms
- Earley parsing (O(n³|G|) complexity)
  - Standard implementation
  - Rescaled version for numerical stability
- CKY parsing
  - Incremental CKY with chart caching
  - Support for prefix computations

### Language Model Interface
- `BoolCFGLM`: Boolean-weighted CFG language model
- `CKYLM`: Probabilistic CFG language model using CKY
- `EarleyLM`: Language model using Earley parsing

### Finite State Automata
- Weighted FSA implementation
- Operations:
  - Epsilon removal
  - Minimization (Brzozowski's algorithm)
  - Determinization
  - Composition
  - Reversal
  - Kleene star/plus

### Additional Features
- Semiring abstractions (Boolean, Float, Log, Entropy, etc.)
- Efficient chart and agenda-based algorithms
- Grammar-FST composition
- Visualization support via Graphviz

## Development

See [DEVELOPING.md](DEVELOPING.md) for information on how to install the package in development mode.
