Metadata-Version: 2.4
Name: hinglishswd
Version: 0.1.1
Summary: A Python library for Hinglish (Hindi+English code-mixed) NLP: detection, tokenization, transliteration, stop-word removal.
Author: Abhinav Patra
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Natural Language :: Hindi
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Filters
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: indic-transliteration>=2.0.0
Provides-Extra: spacy
Requires-Dist: spacy>=3.0.0; extra == "spacy"
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# hinglishswd

A Python library for Hinglish (Hindi+English code-mixed) NLP.

## Features

- **Language detection** — English / Hindi (Devanagari) / Hinglish (Latin-script Hindi)
- **Tokenization** — Punctuation-aware splitting for Hinglish, spaCy-based for Devanagari Hindi
- **Transliteration** — Hinglish → Hindi Devanagari (via `indic-transliteration`)
- **Translation** — Hinglish/Hindi → English (via `deep-translator` Google Translate)
- **Stop word removal** — Built-in Hindi + Hinglish stop word lists
- **Pipeline API** — Single-call for full processing

## Installation

```bash
pip install hinglishswd
```

## Quick Start

```python
from hinglishswd import HinglishNLP

nlp = HinglishNLP()

# Language detection
nlp.detect("kal mai khaana khane gaya")    # "hinglish"
nlp.detect("Hello, how are you?")           # "english"
nlp.detect("आज मौसम बहुत अच्छा है")         # "hindi"

# Tokenization
nlp.tokenize("kal mai khaana khane gaya")   # ['kal', 'mai', 'khaana', 'khane', 'gaya']

# Transliteration (Hinglish -> Devanagari)
nlp.transliterate("kal mai khaana khane gaya")  # कल मै खान खने गय

# Translation (Hinglish -> English)
nlp.translate("kal mai khaana khane gaya")  # "I went to eat yesterday"

# Full pipeline
result = nlp.pipeline("aaj mausam bahut acha hai")
# {
#   "text": "aaj mausam bahut acha hai",
#   "language": "hinglish",
#   "tokens": ["aaj", "mausam", "bahut", "acha", "hai"],
#   "tokens_no_stopwords": ["aaj", "mausam", "acha"],
#   "devanagari": "आज मौसम बहुत अच्छा है",
#   "english": "the weather is very good today"
# }
```

## Module-level API

```python
from hinglishswd import (
    detect_language,
    tokenize, tokenize_sentences,
    transliterate, hinglish_to_devanagari,
    to_english, hinglish_to_english, translate_pipeline,
    remove_stopwords,
)

lang = detect_language("Aap kahan ho?")
tokens = tokenize("Mujhe paani chahiye")
dev = hinglish_to_devanagari("mera naam rahul hai")
en = hinglish_to_english("aaj kya kar rahe ho?")
```

## Package Structure

```
hinglishswd/
├── __init__.py
├── core.py              # HinglishNLP class (main API)
├── detect.py            # Language detection
├── tokenize.py          # Tokenization
├── transliterate.py     # Script conversion (Indic-transliteration)
├── translate.py         # Translation (deep-translator)
└── stopwords.py         # Hindi + Hinglish stop words
```

## Dependencies

- `indic-transliteration` — Hinglish ↔ Devanagari transliteration
- `deep-translator` — Google Translate-based translation (optional, for `.translate()`)

## License

MIT
