Metadata-Version: 2.4
Name: smarteda
Version: 0.1.0
Summary: Automates Exploratory Data Analysis (EDA) for any pandas DataFrame
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: matplotlib
Requires-Dist: seaborn
Requires-Dist: scipy
Requires-Dist: scikit-learn
Requires-Dist: jinja2
Requires-Dist: missingno
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"

# smarteda

**Automate your Exploratory Data Analysis in one line of code.**

`smarteda` is a Python package that eliminates repetitive EDA code. Instead of writing dozens of pandas lines every time you get a new dataset, `smarteda` analyzes it instantly, gives smart suggestions, and generates a full HTML report.

---

## Installation

```bash
pip install smarteda
```

---

## Quick Start

```python
import pandas as pd
import smarteda

df = pd.read_csv("your_data.csv")

# Run everything at once
smarteda.analyze(df)

# Or generate a full HTML report
smarteda.report(df, output_file="report.html")
```

---

## Functions

| Function | Description |
|---|---|
| `smarteda.basic_eda(df)` | Head, tail, sample, shape, size, info, describe |
| `smarteda.overview(df)` | Shape, memory, data types, constant columns, wrong type detection |
| `smarteda.missing(df)` | Missing value counts, percentages, heatmap, fill suggestions |
| `smarteda.duplicates(df)` | Count and show duplicate rows |
| `smarteda.duplicates(df, drop=True)` | Drop duplicates and return clean DataFrame |
| `smarteda.outliers(df)` | IQR, Z-score, and Isolation Forest outlier detection |
| `smarteda.distributions(df)` | Skewness, kurtosis, transformation suggestions, histogram plots |
| `smarteda.correlations(df)` | Pearson/Spearman/Kendall correlation, multicollinearity warnings |
| `smarteda.categorical(df)` | Value counts, high cardinality detection, encoding suggestions |
| `smarteda.timeseries(df)` | Auto datetime detection, trends, seasonality, gap detection |
| `smarteda.suggestions(df)` | Smart recommendations + ML Readiness Score out of 100 |
| `smarteda.clean(df)` | Auto clean — returns a new cleaned DataFrame |
| `smarteda.clean(df, inplace=True)` | Auto clean — modifies original DataFrame directly |
| `smarteda.visualize(df)` | Auto charts for every column |
| `smarteda.analyze(df)` | Runs ALL functions above in one call |
| `smarteda.report(df)` | Generates a full standalone HTML report |

---

## Examples

### Basic EDA
```python
smarteda.basic_eda(df)        # default 5 rows
smarteda.basic_eda(df, n=10)  # show 10 rows
```

### Missing Values
```python
smarteda.missing(df)
# Output:
#        Count  Percentage
# age       21       10.24
# salary    15        7.32
# Suggestion: age → Fill with mean | salary → Fill with median
```

### Outlier Detection
```python
smarteda.outliers(df)
# Output:
# salary → 8 outliers (3.9%) using IQR
# score  → 1 outliers (0.49%) using Z-score
# Multi-dimensional (Isolation Forest) → 39 outliers (19.02%)
```

### Smart Suggestions + ML Score
```python
smarteda.suggestions(df)
# Output:
# ⚠️  Column 'salary' is highly skewed → apply log transformation
# ⚠️  'height' and 'weight' are 94% correlated → drop one
# ✅  No duplicates found
# 💡 ML Readiness Score: 87 / 100
```

### Auto Clean
```python
# Safe — keeps original df intact
clean_df = smarteda.clean(df)

# Modifies df directly
smarteda.clean(df, inplace=True)
```

### HTML Report
```python
smarteda.report(df, output_file="my_report.html")
# Opens in browser — no extra tools needed
```

---

## What smarteda Detects Automatically

- ✅ Missing values with fill strategy per column
- ✅ Duplicate rows
- ✅ Outliers using 3 methods (IQR, Z-score, Isolation Forest)
- ✅ Skewed distributions with transformation suggestions
- ✅ Multicollinearity between features
- ✅ High cardinality categorical columns
- ✅ Wrong data types (numbers stored as strings, dates as objects)
- ✅ Constant columns (useless for ML)
- ✅ Time series trends, seasonality, and gaps
- ✅ ML Readiness Score out of 100

---

## Dependencies

- pandas
- numpy
- matplotlib
- seaborn
- scipy
- scikit-learn
- jinja2
- missingno

---

## License

MIT License
