Metadata-Version: 2.4
Name: lazybrains
Version: 2.0.0
Summary: Model Selection Tool
Author-email: Foresty <dsparthsrivastava@email.com>
Project-URL: Homepage, https://github.com/Parth-Srivastava-bithub/lightml
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: matplotlib
Requires-Dist: seaborn
Requires-Dist: scikit-learn
Requires-Dist: xgboost
Requires-Dist: shap
Requires-Dist: joblib
Requires-Dist: rich

# 🧠 Universal ML Model Explorer Pro

![Python](https://img.shields.io/badge/Python-3.8+-blue?logo=python)
![Platform](https://img.shields.io/badge/Platform-Cross--Platform-lightgrey)
![License](https://img.shields.io/badge/License-MIT-green)
![Maintained](https://img.shields.io/badge/Maintained%3F-Yes-blue)
![Status](https://img.shields.io/badge/Status-Active-success)
![Build](https://img.shields.io/badge/Build-Passing-brightgreen)
![Downloads](https://img.shields.io/pypi/dm/lazybrains?label=Downloads&color=orange)

> One-line ML pipeline that preprocesses, trains, compares, and visualizes the best model — automatically.


Automatically train, evaluate, compare, and visualize multiple machine learning models — all with one command.

## 🚀 Features

* Auto detection: Classification or Regression
* Auto preprocessing: Scaling, Encoding, Imputation, PCA
* Parallel model training on all cores
* SHAP interpretability plots
* Beautiful visual reports (Confusion Matrix, ROC, Residuals, etc.)
* CLI + Notebook compatible

## 📦 Installation

```bash
pip install -r requirements.txt
```

## 🧪 CLI Usage

```bash
python main.py path/to/dataset.csv target_column_name
```

### Optional flags:

* `--output_dir`: Folder to save results (default: `results`)
* `--pca_components`: Apply PCA on numeric features
* `--no_shap`: Disable SHAP plot (faster)

## 🧬 Python Usage

```python
from yourlib import run_pipeline_in_notebook

run_pipeline_in_notebook(
    dataset_path="data.csv",
    target_column="target",
    pca_components=5,
    no_shap=False
)
```

## 📂 Output

* `best_model.pkl`: Trained model
* Plots: Confusion Matrix, ROC, Residuals, SHAP
* `model_report.txt`: Full model comparison

## 🛠️ Supported Models

* Linear, Tree-based, Ensemble (RF, GB, AdaBoost, XGBoost), KNN, SVM, Stacking
* Auto selection of best based on Accuracy / R²

## Run this in your terminal to install all dependencies

```cmd
pip install pandas numpy matplotlib seaborn scikit-learn xgboost shap joblib rich
```

---


# 🔍 AutoFeatSelect

**A Lightweight Python Library for Automatic Feature Selection**  
*Smart. Fast. Interpretable.*

---

## 🚀 What is AutoFeatSelect?

`AutoFeatSelect` is a fully automated feature selection tool that cleans your dataset by **removing irrelevant, redundant, or low-value features**—all with just one line of code. Whether you’re building a classification model or regression model, this tool will help you improve model performance and training speed without the hassle of manual preprocessing.

---

## ✨ Why AutoFeatSelect is Cool

- ✅ **Zero manual inspection** — It decides what to drop based on solid math.
- 🔄 **Handles both numeric & categorical features**
- 📉 Drops features using:
  - Missing value ratio
  - Low variance
  - Correlation (pairwise & clustered)
  - VIF (multicollinearity)
  - Mutual Information
  - Tree-based feature importance
- 📄 **Detailed drop report** (feature + reason)
- 🪶 Lightweight: Only uses `pandas`, `numpy`, `scikit-learn`, `statsmodels`, `scipy`

---

## 📦 Installation

```bash
pip install -U pandas numpy scikit-learn statsmodels scipy
```



Clone this repo or copy `AutoFeatSelect` into your project.

---

## 🛠️ How to Use

```python
from autofeatselect import AutoFeatSelect

selector = AutoFeatSelect(
    target_col='target',     # Optional if you want supervised feature selection
    verbose=True             # Optional for progress logs
)

# Fit + transform in one line
df_cleaned = selector.fit_transform(df, drop=True)

# Or separately
selector.fit(df)
df_cleaned = selector.transform(df)

# See what got dropped and why
report = selector.get_report()
print(report)
```

---

## 🧠 When to Use

* Before training ML models, especially with many features
* When data has potential noise, ID columns, or redundancy
* To reduce overfitting and improve model interpretability
* During automated pipelines or pre-model sanity checks

---

## 📝 Example Output

```bash
[AutoFeatSelect] Running: Drop high missing values...
[AutoFeatSelect]   Dropped: ['unimportant_column']
[AutoFeatSelect] Running: Drop single value columns...
[AutoFeatSelect]   Dropped: ['constant_feature']
...
[AutoFeatSelect] Finished selection. Kept 22 out of 48 features.
```

---

## 📊 Feature Drop Criteria

| Technique                | Purpose                                  |
| ------------------------ | ---------------------------------------- |
| Missing Ratio            | Drops features with mostly nulls         |
| Unique Ratio (ID-like)   | Removes fake IDs or row-wise unique cols |
| Variance Threshold       | Removes constant or near-constant cols   |
| Pearson Correlation      | Drops highly correlated pairs            |
| Hierarchical Clustering  | Smarter groupwise redundancy pruning     |
| VIF (Variance Inflation) | Drops multicollinear features            |
| Mutual Information       | Measures info contribution to target     |
| Tree Importance          | Uses ExtraTrees to measure signal power  |

---

## 🤝 Author

Built by **Gemini**
Version: `1.0.0`

---

## ❤️ Contribute / Fork

Feel free to fork and extend this library — make it smarter, add plotting, or wrap it into a full AutoML pipeline!

---

## 🔓 License

MIT — Use freely, just don't claim it's yours 😄


---

Let me know if you want a logo, GitHub structure, or demo notebook too! 📁📈
