Metadata-Version: 2.4
Name: databeacon
Version: 0.1.0
Summary: DataBeacon — A Python library for EDA, data cleaning, feature engineering, and visualization.
License: MIT
Project-URL: Homepage, https://github.com/yourusername/databeacon
Project-URL: Repository, https://github.com/yourusername/databeacon
Project-URL: Bug Tracker, https://github.com/yourusername/databeacon/issues
Keywords: eda,data-science,data-cleaning,feature-engineering,visualization
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5
Requires-Dist: numpy>=1.23
Requires-Dist: matplotlib>=3.6
Requires-Dist: scipy>=1.10
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Dynamic: license-file

# databeacon

**A Python library for exploratory data analysis, data cleaning, feature engineering, and visualization.**

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

---

## Installation

```bash
pip install databeacon
```

Or install from source:

```bash
git clone https://github.com/Adityasharma-6782/databeacon.git
cd databeacon
pip install -e ".[dev]"
```

---

## Quick Start

```python
import pandas as pd
from databeacon import (
    summarize, describe_numerics, describe_categoricals, correlation_matrix,
    drop_duplicates, handle_missing, remove_outliers, fix_dtypes,
    encode_categoricals, scale_features, apply_transforms,
    plot_distributions, plot_correlations, plot_missing,
)

df = pd.read_csv("your_data.csv")

# ── EDA ──────────────────────────────────────────────────────────
info = summarize(df)
print(info["shape"], info["missing"])

num_stats = describe_numerics(df)
cat_stats = describe_categoricals(df)
corr = correlation_matrix(df, method="spearman")

# ── Cleaning ─────────────────────────────────────────────────────
df = drop_duplicates(df)
df = handle_missing(df, strategy="mean")          # or "median", "mode", "drop", "constant"
df = remove_outliers(df, method="iqr")            # or "zscore"
df = fix_dtypes(df, datetime_cols=["date_col"])

# ── Feature Engineering ──────────────────────────────────────────
df = encode_categoricals(df, method="onehot")     # or "label", "ordinal"
df = scale_features(df, method="standard")        # or "minmax", "robust"
df = apply_transforms(df, columns=["price"], method="log")

# ── Visualization ────────────────────────────────────────────────
plot_distributions(df)
plot_correlations(df)
plot_missing(df)
```

---

## Modules

### `databeacon.eda`

| Function | Description |
|---|---|
| `summarize(df)` | Shape, dtypes, missing counts, duplicates, memory |
| `describe_numerics(df)` | Extended stats: mean, std, skewness, kurtosis, percentiles |
| `describe_categoricals(df)` | Count, unique, top value, frequency for object/category cols |
| `correlation_matrix(df, method, threshold)` | Pearson/Spearman/Kendall correlation with optional masking |

### `databeacon.cleaning`

| Function | Description |
|---|---|
| `drop_duplicates(df, subset, keep)` | Remove duplicate rows |
| `handle_missing(df, strategy, fill_value, drop_threshold)` | Impute or drop missing values |
| `remove_outliers(df, method, columns)` | IQR or Z-score based outlier removal |
| `fix_dtypes(df, datetime_cols, category_threshold)` | Auto-infer and fix column types |

### `databeacon.features`

| Function | Description |
|---|---|
| `encode_categoricals(df, method, ordinal_mapping)` | One-hot, label, or ordinal encoding |
| `scale_features(df, method)` | Standard, min-max, or robust scaling |
| `apply_transforms(df, columns, method)` | Log, sqrt, square, box-cox transforms |
| `create_interaction_features(df, column_pairs, operations)` | Multiply, add, subtract, divide pairs |

### `databeacon.viz`

| Function | Description |
|---|---|
| `plot_distributions(df)` | Histograms + KDE for numeric columns |
| `plot_correlations(df)` | Annotated correlation heatmap |
| `plot_missing(df)` | Horizontal bar chart of missing % |
| `plot_categorical_counts(df)` | Value count bar charts |
| `generate_report(df)` | Auto full-page EDA report |

---

## Development

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run tests with coverage
pytest tests/ --cov=databeacon --cov-report=term-missing

# Lint
ruff check databeacon/

# Format
black databeacon/ tests/
```

---

## Contributing

1. Fork the repository
2. Create a feature branch: `git checkout -b feature/my-feature`
3. Commit your changes: `git commit -m "Add my feature"`
4. Push to the branch: `git push origin feature/my-feature`
5. Open a Pull Request

Please add tests for any new functionality and ensure all tests pass.

---

## License

MIT — see [LICENSE](LICENSE) for details.
