Metadata-Version: 2.4
Name: basicdatainfo
Version: 0.1.0
Summary: One-line automated EDA for pandas DataFrames: column analysis, cleaning, visualization, and HTML reports.
Home-page: https://github.com/yourusername/basicdatainfo
Author: Ranjan Mondal
Author-email: 
License: MIT
Project-URL: Source, https://github.com/yourusername/basicdatainfo
Project-URL: Tracker, https://github.com/yourusername/basicdatainfo/issues
Keywords: eda pandas data science automation
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.3
Requires-Dist: numpy>=1.20
Requires-Dist: matplotlib>=3.4
Requires-Dist: seaborn>=0.11
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# basicdatainfo

One-line automated EDA for pandas DataFrames.

Send it a DataFrame and it will:

- Detect each column's type (numeric, categorical, datetime, boolean, text)
- Compute summary statistics for every column
- Optionally **clean** the data (`clean=True`): drop duplicate rows, strip
  whitespace, auto-parse date columns, fill missing values, optionally cap
  outliers
- **Automatically engineer date features** (`feature_engineer=True`, default):
  if any column is (or becomes) a datetime column, it automatically derives
  `year`, `quarter`, `month`, `month_name`, `week_of_year`, `day_of_month`,
  `day_of_week`, `day_name`, `is_weekend`, `is_month_start`/`end`,
  `is_quarter_start`/`end`, and `hour` (if a time component is present) —
  and analyzes/visualizes all of these new columns too
- Automatically generate **visualizations**: histograms + boxplots for
  numeric columns, an extra "count by value" bar chart for low-cardinality
  numeric columns (e.g. the engineered month/day-of-week/quarter features),
  a "records over time" trend chart for datetime columns, bar charts for
  categorical columns, a missing-values chart, and a correlation heatmap
- Either **print a summary + show the plots** (default), or generate a
  **single self-contained HTML report** (`html=True`)

## Install

```bash
pip install -e .
```

(or build a wheel with `python -m build` and `pip install dist/*.whl`)

## Usage

```python
import pandas as pd
import basicdatainfo

df = pd.read_csv("data.csv")

# 1. Just analyze + show plots inline (e.g. in Jupyter)
basicdatainfo.analyze(df)

# 2. Auto-clean the data first, then analyze
result = basicdatainfo.analyze(df, clean=True)
clean_df = result["df"]

# 3. Generate a standalone HTML report instead of inline output
basicdatainfo.analyze(df, clean=True, html=True, output_path="report.html")
```

## `analyze()` parameters

| Parameter             | Default        | Description                                                     |
|-----------------------|----------------|------------------------------------------------------------------|
| `clean`                | `False`        | Auto-clean the dataframe before analysis                         |
| `feature_engineer`     | `True`         | Auto-derive calendar features from any datetime column           |
| `drop_date_columns`    | `False`        | Drop the original datetime column after extracting its features  |
| `html`                 | `False`        | Save an HTML report instead of printing/showing plots            |
| `output_path`          | `"eda_report.html"` | Where to save the HTML report                              |
| `numeric_strategy`     | `"median"`     | `"median"`, `"mean"`, or `"zero"` — fill for missing numeric data |
| `categorical_strategy` | `"mode"`       | `"mode"` or `"unknown"` — fill for missing categorical data       |
| `outlier_method`       | `None`         | Set to `"iqr"` to cap numeric outliers using the 1.5×IQR rule     |
| `max_categories`       | `10`           | Max categories shown in bar charts / top-value tables             |
| `show`                 | `True`         | Call `plt.show()` when `html=False`                               |
| `verbose`              | `True`         | Print the text summary when `html=False`                          |

## Return value

`analyze()` returns a dict:

```python
{
  "df": <cleaned-or-original dataframe, plus engineered date columns>,
  "analysis": {"overview": {...}, "column_types": {...}, "columns": {...}},
  "cleaning_report": {"actions": [...]} or None,
  "feature_engineering_report": {"actions": [...], "new_columns": [...]} or None,
  "report_path": "report.html",      # only if html=True
  "figures": {...},                   # only if html=False
}
```
