Metadata-Version: 2.4
Name: combo-nlp
Version: 4.0.3
Summary: COMBO-NLP - A library for Morphosyntactic Tagging and Dependency Parsing.
Author: Maja Jablonska, Michał Ulewicz
License-Expression: GPL-3.0
Project-URL: Homepage, https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp
Project-URL: Documentation, https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp
Project-URL: Repository, https://gitlab.clarin-pl.eu/syntactic-tools/combo-nlp
Keywords: nlp,natural-language-processing,dependency-parsing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.35.0
Requires-Dist: peft>=0.6.0
Requires-Dist: wandb>=0.16.0
Requires-Dist: dacite>=1.8.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: tqdm>=4.66.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: accelerate>=0.25.0
Requires-Dist: sacremoses>=0.1.1
Requires-Dist: huggingface-hub>=0.20.0
Requires-Dist: jinja2>=3.1.0
Requires-Dist: python-dotenv>=1.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: black>=23.9.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.6.0; extra == "dev"
Provides-Extra: lambo
Requires-Dist: lambo>=2.3.1; extra == "lambo"

# COMBO-NLP

A library for Morphosyntactic Tagging and Dependency Parsing based on [Universal Dependencies](https://universaldependencies.org/).

## Installation

```bash
pip install combo-nlp
```

### LAMBO segmenter (optional)

A segmenter is only needed when passing raw text strings to COMBO. If you provide pre-tokenized input (`list[str]` or `list[list[str]]`), no segmenter is required.

When you initialize COMBO with a language name (e.g. `COMBO("Polish")`), it automatically loads a [LAMBO](https://gitlab.clarin-pl.eu/syntactic-tools/lambo) segmenter. If LAMBO is not installed, an `ImportError` is raised. LAMBO is hosted on a custom PyPI index and must be installed separately:

```bash
pip install --index-url https://pypi.clarin-pl.eu/ lambo
```

## Usage

### Full text input

```python
from combo import COMBO

# Load by HuggingFace model ID:
nlp = COMBO.from_pretrained("clarin-pl/combo-nlp-xlm-roberta-base-polish-pdb-ud2.17")
result = nlp("Ala ma kota.")

# Or load by language name (with Lambo segmenter):
nlp = COMBO("Polish")
result = nlp("Ala ma kota.")

# Or use the Language enum:
from combo import Language
nlp = COMBO(Language.POLISH)
result = nlp("Ala ma kota.")

# Multiple sentences:
result = nlp(["Ala ma kota.", "Pies je."])

# Access results:
for sentence in result:
    for token in sentence:
        print(token.form, token.upos, token.head, token.deprel, token.lemma)
```

### Pre-tokenized input

```python
from combo import COMBO

nlp = COMBO.from_pretrained("clarin-pl/combo-nlp-xlm-roberta-base-polish-pdb-ud2.17")

# Single sentence:
result = nlp(["Ala", "ma", "kota", "."], tokenized=True)

# Multiple sentences:
result = nlp([["Ala", "ma", "kota", "."], ["Pies", "je", "."]], tokenized=True)
```
