Metadata-Version: 2.4
Name: uzmorph-nn
Version: 0.1.0
Summary: Neural Network Uzbek Morphological Analyzer (BiLSTM, CSE)
Home-page: https://github.com/UlugbekSalaev/uzmorph_nn
Author: Ulugbek Salaev
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: torch
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# uzmorph-nn: Uzbek Neural Morphological Analyzer

**uzmorph-nn** is a high-accuracy word-level morphological analyzer for the Uzbek language based on a character-level Bidirectional LSTM (BiLSTM) architecture.

## Role & Performance
This package provides a robust foundation for Uzbek NLP tasks. It is specifically optimized for:
- **Agglutinative Processing**: Efficiently decomposes long chains of suffixes.
- **Phonological Awareness**: Handles stem changes (allomorphy) and vowel/consonant harmony correctly.
- **Rule-Augmented Deep Learning**: Built on 100,000+ rules from the Common Stem Expansion (CSE) framework.

## Installation
```bash
pip install uzmorph-nn
```

## Quick Start & Usage Examples

### 1. Basic Analysis (String Output)
Great for quick debugging or readable logs.
```python
from uzmorph_nn import uzmorph_nn

# Initialize the analyzer
analyzer = uzmorph_nn()

# Analyze a word
result = analyzer.analyze("maktabimizda")
print(result)

# Output:
# Result: 'maktabimizda' -> Stem: maktab | POS: NOUN | Tags: [possession=1, cases=Locative, plural=1]
```

### 2. Structured Data (Dictionary)
Ideal for integrating into other projects or data processing pipelines.
```python
result = analyzer.analyze("kitobim")
data = result.to_dict()
print(data)

# Output:
# {
#   "word": "kitobim",
#   "stem": "kitob",
#   "pos": "NOUN",
#   "possession": "1"
# }
```

### 3. API Integration (JSON Output)
Useful for web services or sending data between different languages.
```python
result = analyzer.analyze("yozayapmiz")
print(result.to_json())

# Output:
# {
#   "word": "yozayapmiz",
#   "stem": "yoz",
#   "pos": "VERB",
#   "aspect": "Progressive",
#   "person": "1",
#   "number": "Plural"
# }
```

### 4. Direct Attribute Access
Access specific parts of the analysis directly.
```python
result = analyzer.analyze("olmalar")

print(f"Stem: {result.stem}")       # Stem: olma
print(f"POS: {result.pos}")         # POS: NOUN
print(f"Features: {result.features}") # Features: ['plural=1']
```

## Architecture Details
- **Input**: Character sequence (UTF-8).
- **Architecture**: 2-layer Bidirectional LSTM.
- **Tagging Strategy**: BIO (Beginning, Inside, Outside) sequence labeling.
- **Weights**: Pre-trained on a comprehensive CSE rule-engine dataset.

## License
MIT License. Free for academic and commercial use.
