Metadata-Version: 2.4
Name: aa_prepflow
Version: 0.1.1
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: scikit-learn
Requires-Dist: statsmodels
Requires-Dist: gradio
Requires-Dist: ydata-profiling
Requires-Dist: plotly
Requires-Dist: matplotlib
Requires-Dist: seaborn
Dynamic: description
Dynamic: description-content-type
Dynamic: requires-dist

# AA Library Preparation Flow

Integrated preprocessing tool combining an interactive UI/UX interface and a modular Python library to simplify data cleaning and exploration â€” allowing both visual experimentation and direct implementation to ensure efficient, consistent, and scalable preprocessing workflows.

## 1\. Installation & Import

```bash
pip install aa_prepflow
```

```python
import pandas as pd
import numpy as np
from aa_prepflow import AAPrepflowCrossSection, AAPrepflowTimeSeries, AAPrepflowPanel, AAPrepflowBase
```

---

## 2\. API Function

### A. Direct Method (Quick)

Use this method for quick exploration. This method combines `fit` and `transform` in a single call.

```python
# 1. Initialize Flow based on data type
flow_cs = AAPrepflowCrossSection(data_cs)
flow_ts = AAPrepflowTimeSeries(data_ts, col_time='date')
flow_panel = AAPrepflowPanel(data_panel, col_time='date', group_by='city')

# 2. Handling Missing Value (replace <method_name> from the table below)
data_cleaned = flow_cs.mv.clean_<method_name>()

# 3. Handling Outlier (replace <method_name> from the table below)
data_cleaned = flow_ts.outlier.apply_<method_name>()
```

### B. Fit/Transform Method (Production Ready)

Use this pattern for production pipelines so you can `fit` on training data and `transform` on both training and testing data separately.

```python
# 1. Initialize Flow with TRAINING data
flow = AAPrepflowTimeSeries(data_train_ts, col_time='date')

# 2. FIT on TRAINING data
# Learn parameters from train data (e.g.: median, IQR bounds, etc.)
flow.mv.fit_<method_name>()
flow.outlier.fit_<method_tame>()

# 3. TRANSFORM on Train and Test data
# Apply the learned parameters

# --- Transform Train Data ---
# (Using internal fitted data)
cleaned_train = flow.mv.transform()
cleaned_train = flow.outlier.transform_<handler_name>(cleaned_train)

# --- Transform Test Data ---
# (Using external test data)
cleaned_test = flow.mv.transform(test_data)
cleaned_test = flow.outlier.transform_<handler_name>(cleaned_test)
```

---

## 3\. API Interactive Lab (Gradio)

You can launch an interactive lab to visually test strategies.

### A. Lab with DataFrame

This method will directly load your data into the Gradio application.

```python
# Initialize with data, group_by, and col_time
lab = AAPrepflowBase(data=df_panel, group_by='City', col_time='Date')

# Launch Interactive Lab
lab.flow_lab()
```

### B. Lab without DataFrame

This method will launch an empty Gradio application, and you can upload a CSV file from the browser.

```python
# Initialize AAPrepflowBase
lab = AAPrepflowBase()

# Launch Interactive Lab
lab.flow_lab()
```

---

## 4\. API Reference `<method_name>`

Use the method names from the following table to replace `<...>` in the code examples above.

### A. Missing Value (`.mv`)

| Data Type               | Method (Quick)                   | Method (Fit)                   | Strategy Description               |
| :---------------------- | :------------------------------- | :----------------------------- | :--------------------------------- |
| **Cross-Section**       | `clean_listwise_deletion`        | `fit_listwise_deletion`        | Remove rows with NA.               |
|                         | `clean_mean_median_mode_imputer` | `fit_mean_median_mode_imputer` | Fill with mean, median, or mode.   |
|                         | `clean_hot_deck_imputer`         | `fit_hot_deck_imputer`         | Fill with a random sample.         |
|                         | `clean_regression_imputer`       | `fit_regression_imputer`       | Fill with regression prediction.   |
|                         | `clean_knn_imputer`              | `fit_knn_imputer`              | Fill with K-Nearest Neighbors.     |
| **Time-Series / Panel** | `clean_locf`                     | `fit_locf`                     | Last Observation Carried Forward.  |
|                         | `clean_nocb`                     | `fit_nocb`                     | Next Observation Carried Backward. |
|                         | `clean_interpolation`            | `fit_interpolation`            | Interpolation (e.g.: linear).      |
|                         | `clean_seasonal_imputer`         | `fit_seasonal_imputer`         | Fill with seasonal average.        |
|                         | `clean_arima_imputer`            | `fit_arima_imputer`            | Fill with ARIMA prediction.        |
|                         | `clean_kalman_imputer`           | `fit_kalman_imputer`           | Fill with Kalman Filter.           |
|                         | `clean_moving_average_imputer`   | `fit_moving_average_imputer`   | Fill with MA average.              |
|                         | `clean_cagr_imputer`             | `fit_cagr_imputer`             | Fill with geometric interpolation. |

### B. Outlier (`.outlier`)

#### Step 1: Detection (Quick `apply_...` or Fit `fit_...`)

| Data Type               | Method (Quick)          | Method (Fit)      | Detection Description         |
| :---------------------- | :---------------------- | :---------------- | :---------------------------- |
| **Cross-Section**       | `apply_iqr_...`         | `fit_iqr`         | Interquartile Range (IQR).    |
|                         | `apply_zscore_...`      | `fit_zscore`      | Z-Score (Standard Deviation). |
| **Time-Series / Panel** | `apply_iqr_...`         | `fit_iqr`         | Interquartile Range (IQR).    |
|                         | `apply_zscore_...`      | `fit_zscore`      | Z-Score (Standard Deviation). |
|                         | `apply_rolling_iqr_...` | `fit_rolling_iqr` | Rolling IQR (adaptive).       |

#### Step 2: Handling (Quick `..._capping` or Transform `transform_...`)

| Method (Quick)         | Method (Transform)     | Handling Description                       |
| :--------------------- | :--------------------- | :----------------------------------------- |
| `apply_..._capping`    | `transform_capping`    | Replace outlier with the boundary value.   |
| `apply_..._imputation` | `transform_imputation` | Replace outlier with 'mean'/'median'.      |
| `apply_..._locf`       | `transform_locf`       | Replace outlier with the last valid value. |
