Metadata-Version: 2.4
Name: masterclean
Version: 1.3.0
Summary: Automated Data Cleaning, Validation and Analytics Toolkit
Author: Mohamed Faisal
License-Expression: MIT
Project-URL: Homepage, https://github.com/MohamedFaisal-11/masterclean
Project-URL: Repository, https://github.com/MohamedFaisal-11/masterclean
Project-URL: Issues, https://github.com/MohamedFaisal-11/masterclean/issues
Keywords: data-cleaning,data-analysis,csv-cleaner,data-validation,analytics,plotly,automation,preprocessing
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: typer
Requires-Dist: chardet
Requires-Dist: pytest
Requires-Dist: openpyxl
Requires-Dist: matplotlib
Requires-Dist: plotly
Requires-Dist: rich
Dynamic: license-file

# 🚀 MasterClean

![PyPI](https://img.shields.io/pypi/v/masterclean)

![Python](https://img.shields.io/badge/python-3.10-blue)

![License](https://img.shields.io/badge/license-MIT-green)

Automated Data Cleaning, Validation & Analytics Toolkit for Python.

MasterClean is a professional Python package that automates:

- data cleaning
- preprocessing
- validation
- profiling
- visualization
- reporting
- analytics

using simple CLI commands or Python API.

Designed for:

- Data Analysts
- Data Scientists
- ML Engineers
- Researchers
- Students
- Automation workflows

---

# ✨ Features

## 🧹 Advanced Data Cleaning

- Missing value handling
- Duplicate row removal
- Empty string cleanup
- Whitespace cleanup
- Column standardization
- Datetime conversion
- Smart categorical filling
- Automatic preprocessing pipeline

---

## ⚡ Datatype Optimization

- Integer optimization
- Float optimization
- Boolean conversion
- Category optimization
- Datetime detection
- Memory usage reduction

---

## 🛡 Advanced Validation Engine

- Negative value detection
- Outlier detection
- Invalid boolean detection
- Email validation
- Phone validation
- Duplicate percentage warnings
- Missing value percentage analysis
- Mixed datatype detection

---

# 📊 Advanced Profiling

- Dataset health score
- Missing value summaries
- Datatype analytics
- Memory usage analysis
- Numeric statistics
- Categorical summaries
- Dataset overview metrics

---

# 📈 Interactive Visualization Engine

- Plotly dashboards
- Histograms
- Boxplots
- Pie charts
- Correlation heatmaps
- Missing value charts
- Interactive analytics dashboards

---

# 📄 Reporting System

- Unified HTML analytics dashboard
- Validation summaries
- Dataset overview cards
- Interactive visualizations
- Automated report generation

---

# 🖥 Professional CLI Toolkit

MasterClean now supports multiple commands.

## Full Automated Pipeline

```bash
masterclean clean data.csv
````

Runs:

* cleaning
* optimization
* validation
* profiling
* visualization
* reporting
* exporting

---

## Validation Only

```bash
masterclean validate data.csv
```

---

## Dataset Profiling

```bash
masterclean profile data.csv
```

---

## Dashboard Generation

```bash
masterclean dashboard data.csv
```

---

## Show Version

```bash
masterclean version
```

---

# 📦 Installation

## Install from PyPI

```bash
pip install masterclean
```

---

## Upgrade to Latest Version

```bash
pip install --upgrade masterclean
```

---

# 🐍 Python Usage

```python
from masterclean import *

df, file_extension = read_file("data.csv")

df = clean_data(df)

df = optimize_dtypes(df)

warnings = validate_data(df)

profile = generate_profile(df)

charts = generate_charts(df)

generate_report(
    df=df,
    warnings=warnings,
    profile=profile,
    charts=charts
)

export_data(
    df,
    "cleaned_data",
    file_extension
)
```

---

# 📂 Supported File Formats

| Format | Supported |
| ------ | --------- |
| CSV    | ✅         |
| XLSX   | ✅         |
| XLS    | ✅         |

---

# 🔄 Same-Format Export System

MasterClean automatically preserves output format.

| Input | Output            |
| ----- | ----------------- |
| CSV   | cleaned_data.csv  |
| XLSX  | cleaned_data.xlsx |
| XLS   | cleaned_data.xlsx |

---

# 📊 Example Validation Output

```text
VALIDATION WARNINGS
========================================

⚠ Negative values found in 'salary' (3 rows)

⚠ Invalid email values found in 'email' (5 rows)

⚠ High duplicate rows detected (14.2%)

⚠ Mixed datatypes detected in 'age'
```

---

# 🏗 Architecture

```text
Read
   ↓
Clean
   ↓
Optimize
   ↓
Validate
   ↓
Profile
   ↓
Visualize
   ↓
Report
   ↓
Export
```

---

# 📁 Project Structure

```text
masterclean/
│
├── cleaner.py
├── validator.py
├── datatypes.py
├── profiler.py
├── visualizer.py
├── report.py
├── exporter.py
├── reader.py
├── cli.py
├── __init__.py
│
tests/
│
├── test_cleaner.py
├── test_validator.py
├── test_reader.py
├── test_report.py
├── test_visualizer.py
│
.github/workflows/
│
└── tests.yml
```

---

# 🧪 Testing

Run tests using:

```bash
python -m pytest
```

---

# 🔄 CI/CD

MasterClean uses GitHub Actions for:

* automated testing
* dependency validation
* continuous integration

---

# 🛣 Roadmap

Future improvements planned:

* Streamlit web application
* AI-powered cleaning suggestions
* Large dataset optimization
* Schema validation engine
* Cloud deployment support
* Plugin architecture
* Real-time analytics dashboards

---

# 🤝 Contributing

Contributions are welcome.

You can:

* report bugs
* suggest features
* improve documentation
* submit pull requests

---

# 📄 License

MIT License

---

# 👨‍💻 Author

Mohamed Faisal Maraicar N

GitHub:
https://github.com/MohamedFaisal-11/masterclean

