Metadata-Version: 2.1
Name: NepaliKit
Version: 1.0.1
Summary: A Nepali language processing library
Home-page: https://github.com/prabhashj07/NepaliKit.git
Author: Prabhash Kumar Jha
Author-email: prabhashj07@gmail.com
License: MIT
Project-URL: Bug Reports, https://github.com/prabhashj07/NepaliKit/issues
Project-URL: Source, https://github.com/prabhashj07/NepaliKit/
Project-URL: Documentation, https://nepalikit.readthedocs.io/
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Natural Language :: Nepali
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: torch
Requires-Dist: sentencepiece

NepaliKit
=========

NepaliKit is a Python library for natural language processing tasks in the Nepali language.

Installation
------------

You can install NepaliKit using pip:

    pip install NepaliKit

Features
--------

NepaliKit provides the following features:

- **Tokenization**: Tokenize Nepali text using the SentencePiece tokenizer.
- **Preprocessing**: Clean and preprocess Nepali text data, including removing HTML tags, special characters, and other cleaning tasks.
- **Stopword Management**: Load, add, and remove stopwords from Nepali text.
- **Sentence Operations**: Segment Nepali text into sentences based on punctuation marks.
- **SentencePiece Model Training**: Train custom SentencePiece models for Nepali text data.
- **Utility Functions**: Various utility functions for text processing and manipulation.
- **Integration with PyTorch**: Utilities for integrating with PyTorch for machine learning tasks.

Usage
-----

### Tokenization Example

```python
from NepaliKit.tokenization import SentencePieceTokenizer

text = "नमस्ते, के छ खबर?"
tokenizer = SentencePieceTokenizer()
tokens = tokenizer.tokenize(text)
print(tokens)
```

### Preprocessing Example 

```python
from NepaliKit.preprocessing import remove_html_tags, remove_special_characters

text = "<p>नमस्ते, के छ खबर?</p>"
clean_text = remove_html_tags(text)
clean_text = remove_special_characters(clean_text)
print(clean_text)
```

### Stopword Example 

```python
from NepaliKit.manage_stopwords import load_stopwords, add_stopword, remove_stopword

stopwords = load_stopwords('/path/to/stopword/directory')
add_stopword('नयाँ_स्टापवर्ड')
remove_stopword('कुनै_स्टापवर्ड')
```

License
-----
This project is licensed under the MIT License.

Author
-----
- Prabhash Kumar Jha
- Email: prabhashj07@gmail.com
