Metadata-Version: 2.4
Name: batabyal
Version: 1.2.1
Summary: A lightweight Python package for Machine Learning utilities
Author: T Batabyal
Author-email: T Batabyal <tamanashbatabyal@gmail.com>
License-Expression: MIT
Keywords: ML,DataScience
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5.0
Requires-Dist: scikit-learn>=1.2.0
Requires-Dist: xgboost>=1.7.0
Requires-Dist: catboost>=1.2.0
Requires-Dist: wittgenstein>=0.3.4
Requires-Dist: skl2onnx>=1.15
Requires-Dist: onnx>=1.14
Requires-Dist: onnxmltools>=1.11
Requires-Dist: onnxruntime>=1.16
Requires-Dist: imodels>=1.3
Requires-Dist: numpy>=1.21
Dynamic: author
Dynamic: license-file
Dynamic: requires-python

### Package: batabyal
---
**batabyal** is a lightweight Python package for Machine Learning utilities that provides:
- **cleaning_module** - A CSV data cleaning module
- **trainer_kit** - ML module for classification problems

### Installation
---
Use the below command in the terminal
```bash
pip install batabyal
```

### Importation
---
Import a specific thing or the entire module whatever is required
```python
from batabyal import cleaning_module as cm
from batabyal.trainer_kit import TransformedTargetClassifier, autofit_classification_model
```

### Usage
---
**1. cleaning_module:** It provides only one function `clean_csv` used for cleaning .csv datasets efficiently
```python
cm.clean_csv('filename.csv', numericData, charData, True, True) 
#structure: clean_csv(file, numericData, charData, fill, case_sensitivity=False, dummies=None) -> pd.DataFrame
#If `fill==True`, it fills NaN in numeric columns with its mean. 
#if `case_sensitivity=True`, it will lowercase all labelled values.
#`dummies` are the list of values to replace with NaN before cleaning.
```

**2. trainer_kit:** It provides one wrapper class `TransformedTargetClassifier` for encoding and inversely transforming predictions to the original label and one function `autofit_classification_model` for autofitting classification models with the best algorithm and hyperparameters based on `roc_auc_ovr_weighted` score
```python
model = TransformedTargetClassifier(classifier=svc, transformer=labelEncoder)
#let labelencoder and svc are from sklearn 
#you can now use model.fit() , model.predict() with raw labelled data, it will automate the encoding internally for training and prediction
#And model.predict() will return the original label by inversely transforming the encoded numbers back internally 

result = autofit_classification_model(x, y, "numeric", 3)
#structure: autofit_classification_model(x:pd.DataFrame, y:pd.DataFrame, x_type:Literal["numeric", "categorical", "mixed"], n_splits:int, cat_features:list[str]=[], whitelisted_algorithms:list[Literal["LogisticRegression", "DecisionTree", "RandomForest", "GaussianNB", "BernoulliNB", "CategoricalNB", "CatBoost", "XGBoost", "Ripper", "SVC", "KNN"]]|Literal["auto"]="auto", enable_votingClassifier:bool=True, random_state:int|None=42, verbosity:bool=True) -> object

model = result.model #now use model.predict
score = result.score #print score
classifier = result.classifier #print classifier to know the best algorithm name that's used
convertible_model = result.convertible_model #extracts the model only (no preprocessing)
preprocessedX = result.preprocessedX #extracts the x features after preprocessing
n_features = result.n_features #returns total number of the preprocessed x features
initial_type = result.initial_type #initial type needed to convert the model to .onnx
result.export_to_onnx() #dump the model as 'model.onnx' in your current working directory (Input name: 'input')
```
