Metadata-Version: 2.4
Name: aa_prepflow
Version: 0.1
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: scikit-learn
Requires-Dist: statsmodels
Requires-Dist: gradio
Requires-Dist: ydata-profiling
Requires-Dist: plotly
Requires-Dist: matplotlib
Requires-Dist: seaborn
Dynamic: description
Dynamic: description-content-type
Dynamic: requires-dist

# ðŸš€ AAPrepflow

Integrated preprocessing tool combining an interactive UI/UX interface and a modular Python library to simplify data cleaning and exploration â€” allowing both visual experimentation and direct implementation to ensure efficient, consistent, and scalable preprocessing workflows.


## 1\. Installation & Import

```bash
pip install AAPrepflow
```

```python
import pandas as pd
import numpy as np
from aaprepflow import AAPrepflowCrossSection, AAPrepflowTimeSeries, AAPrepflowPanel, AAPrepflowBase
```

-----

## 2\. API Function

### A. Direct Method (Quick)

Use this method for quick exploration. This method combines `fit` and `transform` in a single call.

```python
# 1. Initialize Flow based on data type
flow_cs = AAPrepflowCrossSection(data_cs)
flow_ts = AAPrepflowTimeSeries(data_ts, col_time='date')
flow_panel = AAPrepflowPanel(data_panel, col_time='date', group_by='city')

# 2. Handling Missing Value (replace <method_name> from the table below)
data_cleaned = flow_cs.mv.clean_<method_name>()

# 3. Handling Outlier (replace <method_name> from the table below)
data_cleaned = flow_ts.outlier.apply_<method_name>()
```

### B. Fit/Transform Method (Production Ready)

Use this pattern for production pipelines so you can `fit` on training data and `transform` on both training and testing data separately.

```python
# 1. Initialize Flow with TRAINING data
flow = AAPrepflowTimeSeries(data_train_ts, col_time='date')

# 2. FIT on TRAINING data
# Learn parameters from train data (e.g.: median, IQR bounds, etc.)
flow.mv.fit_<method_name>()
flow.outlier.fit_<method_tame>()

# 3. TRANSFORM on Train and Test data
# Apply the learned parameters

# --- Transform Train Data ---
# (Using internal fitted data)
cleaned_train = flow.mv.transform()
cleaned_train = flow.outlier.transform_<handler_name>(cleaned_train)

# --- Transform Test Data ---
# (Using external test data)
cleaned_test = flow.mv.transform(test_data)
cleaned_test = flow.outlier.transform_<handler_name>(cleaned_test)
```

-----

## 3\. API Interactive Lab (Gradio)

You can launch an interactive lab to visually test strategies.

### A. Lab with DataFrame

This method will directly load your data into the Gradio application.

```python
# Initialize with data, group_by, and col_time
lab = AAPrepflowBase(data=df_panel, group_by='City', col_time='Date')

# Launch Interactive Lab
lab.flow_lab()
```

### B. Lab without DataFrame

This method will launch an empty Gradio application, and you can upload a CSV file from the browser.

```python
# Initialize AAPrepflowBase
lab = AAPrepflowBase()

# Launch Interactive Lab
lab.flow_lab()
```

-----

## 4\. API Reference `<method_name>`

Use the method names from the following table to replace `<...>` in the code examples above.

### A. Missing Value (`.mv`)

| Data Type | Method (Quick) | Method (Fit) | Strategy Description |
| :--- | :--- | :--- | :--- |
| **Cross-Section** | `clean_listwise_deletion` | `fit_listwise_deletion` | Remove rows with NA. |
| | `clean_mean_median_mode_imputer` | `fit_mean_median_mode_imputer` | Fill with mean, median, or mode. |
| | `clean_hot_deck_imputer` | `fit_hot_deck_imputer` | Fill with a random sample. |
| | `clean_regression_imputer` | `fit_regression_imputer` | Fill with regression prediction. |
| | `clean_knn_imputer` | `fit_knn_imputer` | Fill with K-Nearest Neighbors. |
| **Time-Series / Panel** | `clean_locf` | `fit_locf` | Last Observation Carried Forward. |
| | `clean_nocb` | `fit_nocb` | Next Observation Carried Backward. |
| | `clean_interpolation` | `fit_interpolation` | Interpolation (e.g.: linear). |
| | `clean_seasonal_imputer` | `fit_seasonal_imputer` | Fill with seasonal average. |
| | `clean_arima_imputer` | `fit_arima_imputer` | Fill with ARIMA prediction. |
| | `clean_kalman_imputer` | `fit_kalman_imputer` | Fill with Kalman Filter. |
| | `clean_moving_average_imputer` | `fit_moving_average_imputer` | Fill with MA average. |
| | `clean_cagr_imputer` | `fit_cagr_imputer` | Fill with geometric interpolation. |

### B. Outlier (`.outlier`)

#### Step 1: Detection (Quick `apply_...` or Fit `fit_...`)

| Data Type | Method (Quick) | Method (Fit) | Detection Description |
| :--- | :--- | :--- | :--- |
| **Cross-Section** | `apply_iqr_...` | `fit_iqr` | Interquartile Range (IQR). |
| | `apply_zscore_...` | `fit_zscore` | Z-Score (Standard Deviation). |
| **Time-Series / Panel** | `apply_iqr_...` | `fit_iqr` | Interquartile Range (IQR). |
| | `apply_zscore_...` | `fit_zscore` | Z-Score (Standard Deviation). |
| | `apply_rolling_iqr_...` | `fit_rolling_iqr` | Rolling IQR (adaptive). |

#### Step 2: Handling (Quick `..._capping` or Transform `transform_...`)

| Method (Quick) | Method (Transform) | Handling Description |
| :--- | :--- | :--- |
| `apply_..._capping` | `transform_capping` | Replace outlier with the boundary value. |
| `apply_..._imputation` | `transform_imputation` | Replace outlier with 'mean'/'median'. |
| `apply_..._locf` | `transform_locf` | Replace outlier with the last valid value. |
