Metadata-Version: 2.1
Name: balinese_nlp
Version: 2.4.2
Summary: A comprehensive Python package tools for Balinese Natural Language Processing
Author: I Made Satria Bimantara
Author-email: satriabimantara@unud.ac.id
Keywords: Balinese,NLP,Text Preprocessing,Semantic Feature Extraction,Embedding Models,Narrative Analysis,NER,POS Tagging,Summarization
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Operating System :: OS Independent
Classifier: Natural Language :: Indonesian
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: gensim ==4.3.3
Requires-Dist: huggingface-hub ==0.33.0
Requires-Dist: scikit-learn ==1.2.1
Requires-Dist: matplotlib ==3.6.3
Requires-Dist: matplotlib-inline ==0.1.6
Requires-Dist: nltk
Requires-Dist: transformers ==4.33.3
Requires-Dist: torch ==2.3.0
Requires-Dist: pandas ==1.5.3
Requires-Dist: jaro-winkler ==2.0.3
Requires-Dist: sklearn-crfsuite ==0.3.6
Requires-Dist: hmmlearn ==0.3.0
Requires-Dist: statsmodels ==0.14.1
Requires-Dist: krippendorff ==0.6.0
Requires-Dist: openpyxl ==3.1.5

# Balinese Natural Language Processing Package
> The First Comprehensive Python NLP Tools for Natural Language Processing

Package Structure of ***balinese_nlp***: <br>
- **textpreprocessor** &check;
	- `TextPreprocessor.py`
	- `utils.py`
	- *data* 
		- *lemmatization*
			- `balivocab.txt`
		- *normalizedwords*
			- `data.xlsx`
		- *stopwords*
			- `data.txt`
	- *lemmatization* 
		- *LevenstheinDistance*
			- `Lemmatization.py`
			- `LemmatizationRules.py`
	- *stemming*  (under development)
- **narratives**
	- *aliasclustering* &check;
		- *rule_based* &check;
			- `AliasClusteringRuleBased.py`
			- `PairwiseDistanceString.py`
		- *supervised* (under development) 
	- *characterclassification* 
		- *rule_based* &check;
			- `RuleBasedLexiconClassifier.py`
		- *supervised* (under development) 
	- *characterner* &check;
		- *datapreparation* 
			- `DataPreparation.py`
		- `BaseModel.py`
		- `ConditionalRandomFields.py`
		- `HiddenMarkovModel.py`
		- `HybridBPSOCRF.py`
		- `ScikitLearnClassifiers.py`
	- *clustercharacterner* &check;
		- `AgglomerativeClusteringNER.py`
		- `Base.py`
		- `BIRCHNER.py`
		- `DBSCANNER.py`
		- `HDBSCANNER.py`
		- `KMeansNER.py`
		- `OPTICSNER.py`
		- `SpectralClusteringNER.py`
- **corefresolution**
	- *rule_based* &check;
		- `CoreferenceResolution.py`
		- `LinkedList.py`
	- *supervised* (under development)
- **embeddings** &check;
	- `BasePretrained.py` 
	- *BERT* 
		- `BalimultiLingBERT.py`
	- *gensims* 
		- `BaliFastText.py`
		- `BaliWord2Vecs.py`
	- *gloves* 
		- `BaliGlove.py`
		- `Glove.py`
- **feature_extractor**
	- *data* &check;
		- `booster_words.txt`
		- `negation_words.txt`
	- *narratives* &check;
		- *characterclassification*
			- `BaseModelFeatureExtraction.py`
			- `FeatureExtraction.py`
			- `LexiconFeatureExtraction.py`
			- `POSTagFeatureExtraction.py`
			- `WordEmbeddingFeatureExtraction.py`
		- *summarization*
			- `FeatureExtractor.py`
			- `TFISFVectorizer.py`
- **ner** &check;
	- `utils.py`
	- *data* 
		- `BaliVocab.txt`
		- `sansekertavocab.txt`
	- *rule_based* 
		- `NERLocation.py`
		- `NERPerson.py`
		- `NERTimeExpression.py`
- **postag** &check;
	- `utils.py`
	- *data* 
		- *HMM*
			- `hmmmodel.txt`
	- *HMM* 
		- `HiddenMarkovModelPOSTag.py`
- **quoteattribution**
	- *rule_based* 
		- `RuleBasedSentenceGrouping.py`
- **summarization** (under development)
	- *abstractive* 
	- *extractive* 
		- *deeplearning* 
		- *machinelearning* 
		- *metaheuristics* 
