Metadata-Version: 2.4
Name: marathi-shabda
Version: 0.1.0
Summary: Deterministic, offline Marathi word analysis library (shabda = word in Marathi)
Author: Marathi Pratham Contributors
License: MIT
Project-URL: Homepage, https://github.com/yourusername/marathi-shabda
Project-URL: Documentation, https://github.com/yourusername/marathi-shabda#readme
Project-URL: Repository, https://github.com/yourusername/marathi-shabda
Project-URL: Issues, https://github.com/yourusername/marathi-shabda/issues
Keywords: marathi,nlp,morphology,dictionary,devanagari,lemmatization
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: Marathi
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: build>=0.10; extra == "dev"
Dynamic: license-file

# marathi-shabda

**Deterministic, offline Marathi word analysis library**

[![PyPI version](https://badge.fury.io/py/marathi-shabda.svg)](https://badge.fury.io/py/marathi-shabda)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

---

## What is marathi-shabda?

`marathi-shabda` is a production-quality Python library for analyzing Marathi words. It provides:

1. **Lemma (stem) extraction** from inflected Marathi words
2. **Dictionary lookup** (Marathi ↔ English) with meanings
3. **Morphological analysis** (रूप परिचय) including POS, vibhakti, and kāl detection

### Why "pratham" (प्रथम)?

*Pratham* means "first" in Marathi. This library provides the **first step** in Marathi text analysis: understanding individual words before tackling sentences or documents.

---

## Motivation

Marathi language tooling lags behind other Indian languages. Existing solutions either:
- Require network access (API-based)
- Hallucinate meanings (LLM-based)
- Lack linguistic grounding (pure ML)

**marathi-shabda** is different:
- ✅ **Offline-first**: No network, no API keys
- ✅ **Dictionary-backed**: Authoritative meanings, no hallucinations
- ✅ **Explainable**: Shows reasoning for every decision
- ✅ **Honest about limitations**: Surfaces ambiguity instead of hiding it

---

## What It Does

### ✅ Supported Features

- **Lemma extraction**: `पाण्यावर` → `पाणी` (water)
- **Vibhakti detection**: Identifies case markers (तृतीया, सप्तमी, संबंध, etc.)
- **Dictionary lookup**: Marathi → English meanings
- **POS tagging**: Conservative noun/verb/adjective classification
- **Kāl inference**: Basic tense detection for verbs
- **Roman input**: Accepts romanized Marathi (e.g., `pani` → `पाणी`)
- **Stem alternations**: Handles oblique forms (`पाण्य` → `पाणी`)

### ❌ Explicit Non-Goals

This library **does NOT**:
- Parse sentences or multi-word phrases
- Claim grammatical correctness in all contexts
- Infer semantics beyond dictionary meanings
- Require network access
- Use machine learning (v0.1.0)

---

## Installation

```bash
pip install marathi-shabda
```

**Requirements**: Python 3.8+, no external dependencies

---

## Quick Start

### 1. Lemma Extraction

```python
from marathi_shabda import get_lemma

result = get_lemma("पाण्यावर")
print(result.lemma)              # पाणी
print(result.confidence)         # 0.9
print(result.detected_vibhakti)  # VibhaktiType.SAPTAMI (सप्तमी)
print(result.explanation)        # "Detected सप्तमी vibhakti"
```

### 2. Dictionary Lookup

```python
from marathi_shabda import lookup_word

result = lookup_word("पाणी")
print(result.english_meanings)   # ['water']
print(result.found)              # True

# Also works with Roman input
result = lookup_word("pani")
print(result.lemma)              # पाणी
```

### 3. Morphological Analysis

```python
from marathi_shabda import analyze_word

result = analyze_word("मुलाने")
print(result.lemma)      # मुल
print(result.pos)        # POSTag.NOUN
print(result.vibhakti)   # VibhaktiType.TRUTIYA (तृतीया)
print(result.confidence) # 0.9
print(result.explanation)
# "Detected तृतीया vibhakti; Inferred noun"
```

---

## How It Works

### Architecture

```
Input Word
   ↓
Normalization (Roman → Devanagari)
   ↓
Dictionary Check (exact match?)
   ↓
Vibhakti Detection (longest-first)
   ↓
Stem Alternations (पाण्य → पाणी)
   ↓
Dictionary Validation (lemma exists?)
   ↓
POS & Kāl Inference
   ↓
Result with Confidence
```

### Key Principles

1. **Dictionary-first validation**: Rules generate candidates, dictionary decides truth
2. **Longest-match-first**: Detects `मध्ये` before `ये`
3. **Conservative inference**: Returns `UNKNOWN` when uncertain
4. **Explainable decisions**: Every result includes reasoning

---

## Confidence & Ambiguity

### Confidence Scores

- **1.0**: Exact dictionary match
- **0.9**: Vibhakti detected, lemma validated
- **0.7**: Ambiguous (multiple possible lemmas)
- **0.0**: Word not in dictionary

### Handling Ambiguity

```python
result = get_lemma("घरात")
if result.ambiguous:
    print(f"Multiple interpretations: {result.candidates}")
    # ['घर', 'घरात']  # Could be noun or compound
```

**Philosophy**: We surface ambiguity instead of making false claims.

---

## Offline Guarantee

**marathi-shabda** works completely offline:
- ✅ No network requests
- ✅ No API keys
- ✅ No telemetry
- ✅ Bundled SQLite database
- ✅ Pure Python (stdlib only)

Perfect for:
- Privacy-sensitive applications
- Offline environments
- Embedded systems
- Research reproducibility

---

## Limitations

### Current Limitations (v0.1.0)

- **Single words only**: No sentence parsing
- **Conservative POS tagging**: Limited to obvious cases
- **Basic kāl detection**: Only common verb patterns
- **No semantic analysis**: Dictionary meanings only
- **Limited verb conjugation**: Focus on nouns/vibhakti

### Known Edge Cases

- Compound words may not split correctly
- Rare vibhaktis may not be detected
- Ambiguous forms return multiple candidates
- Roman transliteration is approximate

**We document limitations honestly.** If you encounter issues, please report them!

---

## Future Roadmap

### v0.2.0 (Planned)
- [ ] Extended database schema (POS, gender, number)
- [ ] Improved verb conjugation analysis
- [ ] Compound word splitting
- [ ] Performance optimizations

### v0.3.0 (Planned)
- [ ] Optional SLM integration for ambiguity resolution
- [ ] Sentence-level analysis (experimental)
- [ ] Batch processing API

### Long-term
- [ ] Hybrid rule-based + ML approach
- [ ] Community-contributed dictionary expansions
- [ ] Web API (optional deployment)

---

## Command-Line Interface

```bash
# Extract lemma
marathi-shabda lemma पाण्यावर

# Dictionary lookup
marathi-shabda lookup पाणी

# Full analysis
marathi-shabda analyze मुलाने
```

---

## Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for:
- How to add vibhakti rules
- How to improve transliteration
- Code style guidelines
- Testing requirements

---

## License

MIT License - see [LICENSE](LICENSE) for details

---

## Acknowledgments

- Marathi language scholars and grammarians
- Open-source NLP community
- Contributors and testers

---

## Citation

If you use marathi-shabda in research, please cite:

```bibtex
@software{marathi_shabda,
  title = {marathi-shabda: Deterministic Marathi Word Analysis},
  author = {Marathi Pratham Contributors},
  year = {2026},
  url = {https://github.com/yourusername/marathi-shabda}
}
```

---

## Support

- **Issues**: [GitHub Issues](https://github.com/yourusername/marathi-shabda/issues)
- **Discussions**: [GitHub Discussions](https://github.com/yourusername/marathi-shabda/discussions)
- **Email**: [your-email@example.com]

---

**Philosophy**: *When unsure, defer. When confident, explain why.*

Built with respect for the Marathi language and its speakers. 🙏
