Metadata-Version: 2.4
Name: AutomatedCleaning
Version: 0.1.9
Summary: Automated Data Cleaning Library
Home-page: https://github.com/DataSpoof/AutomatedCleaning
Author: Abhishek Kumar Singh
Author-email: dataspoof007@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: absl-py==2.1.0
Requires-Dist: annotated-types==0.7.0
Requires-Dist: anthropic==0.49.0
Requires-Dist: anyio==4.8.0
Requires-Dist: astunparse==1.6.3
Requires-Dist: backports.tarfile==1.2.0
Requires-Dist: certifi==2025.1.31
Requires-Dist: charset-normalizer==3.4.1
Requires-Dist: click==8.1.8
Requires-Dist: colorama==0.4.6
Requires-Dist: contourpy==1.3.1
Requires-Dist: cycler==0.12.1
Requires-Dist: distro==1.9.0
Requires-Dist: docutils==0.21.2
Requires-Dist: et_xmlfile==2.0.0
Requires-Dist: filelock==3.17.0
Requires-Dist: flatbuffers==25.2.10
Requires-Dist: fonttools==4.55.2
Requires-Dist: fsspec==2025.3.0
Requires-Dist: fuzzywuzzy==0.18.0
Requires-Dist: gast==0.6.0
Requires-Dist: google-pasta==0.2.0
Requires-Dist: grpcio==1.70.0
Requires-Dist: h11==0.14.0
Requires-Dist: h5py==3.13.0
Requires-Dist: httpcore==1.0.7
Requires-Dist: httpx==0.28.1
Requires-Dist: huggingface-hub==0.29.2
Requires-Dist: idna==3.10
Requires-Dist: imbalanced-learn==0.13.0
Requires-Dist: imblearn==0.0
Requires-Dist: importlib_metadata==8.6.1
Requires-Dist: inexactsearch==1.0.2
Requires-Dist: jaraco.classes==3.4.0
Requires-Dist: jaraco.context==6.0.1
Requires-Dist: jaraco.functools==4.1.0
Requires-Dist: jinja2==3.1.6
Requires-Dist: jiter==0.9.0
Requires-Dist: joblib==1.4.2
Requires-Dist: jsonpatch==1.33
Requires-Dist: jsonpointer==3.0.0
Requires-Dist: kaleido==0.2.1
Requires-Dist: keras==3.9.0
Requires-Dist: keyring==25.6.0
Requires-Dist: kiwisolver==1.4.7
Requires-Dist: langchain-anthropic==0.3.10
Requires-Dist: langchain-core==0.3.45
Requires-Dist: langsmith==0.3.15
Requires-Dist: libclang==18.1.1
Requires-Dist: Markdown==3.7
Requires-Dist: markdown-it-py==3.0.0
Requires-Dist: markupsafe==3.0.2
Requires-Dist: matplotlib==3.9.3
Requires-Dist: mdurl==0.1.2
Requires-Dist: missingno==0.5.2
Requires-Dist: ml-dtypes==0.4.1
Requires-Dist: more-itertools==10.6.0
Requires-Dist: namex==0.0.8
Requires-Dist: narwhals==1.29.1
Requires-Dist: nh3==0.2.20
Requires-Dist: nltk==3.9.1
Requires-Dist: numpy==2.0.2
Requires-Dist: openpyxl==3.1.5
Requires-Dist: opt_einsum==3.4.0
Requires-Dist: optree==0.14.1
Requires-Dist: orjson==3.10.15
Requires-Dist: packaging==24.2
Requires-Dist: pandas==2.2.3
Requires-Dist: pillow==11.0.0
Requires-Dist: plotly==6.0.0
Requires-Dist: polars==1.23.0
Requires-Dist: protobuf==5.29.3
Requires-Dist: pyarrow==19.0.1
Requires-Dist: pydantic==2.10.6
Requires-Dist: pydantic_core==2.27.2
Requires-Dist: pyenchant==3.2.2
Requires-Dist: pyfiglet==1.0.2
Requires-Dist: pygments==2.19.1
Requires-Dist: pyparsing==3.2.0
Requires-Dist: pyspellchecker==0.8.1
Requires-Dist: python-dateutil==2.9.0.post0
Requires-Dist: pytz==2024.2
Requires-Dist: pywin32-ctypes==0.2.3
Requires-Dist: pyyaml==6.0.2
Requires-Dist: rapidfuzz==3.12.1
Requires-Dist: readme_renderer==44.0
Requires-Dist: regex==2024.11.6
Requires-Dist: requests==2.32.3
Requires-Dist: requests-toolbelt==1.0.0
Requires-Dist: rfc3986==2.0.0
Requires-Dist: rich==13.9.4
Requires-Dist: safetensors==0.5.3
Requires-Dist: scikit-learn==1.5.2
Requires-Dist: scipy==1.14.1
Requires-Dist: seaborn==0.13.2
Requires-Dist: silpa_common==0.3
Requires-Dist: six==1.17.0
Requires-Dist: sklearn-compat==0.1.3
Requires-Dist: sniffio==1.3.1
Requires-Dist: soundex==1.1.3
Requires-Dist: spellchecker==0.4
Requires-Dist: tenacity==9.0.0
Requires-Dist: termcolor==2.5.0
Requires-Dist: tf_keras==2.18.0
Requires-Dist: thefuzz==0.22.1
Requires-Dist: threadpoolctl==3.5.0
Requires-Dist: tokenizers==0.21.0
Requires-Dist: tqdm==4.67.1
Requires-Dist: twine==6.1.0
Requires-Dist: typing_extensions==4.12.2
Requires-Dist: tzdata==2024.2
Requires-Dist: urllib3==2.3.0
Requires-Dist: uv==0.6.5
Requires-Dist: werkzeug==3.1.3
Requires-Dist: wrapt==1.17.2
Requires-Dist: zipp==3.21.0
Requires-Dist: zstandard==0.23.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# AutomatedCleaning

AutomatedCleaning is a Python library for automated data cleaning.It helps preprocess and analyze datasets by handling missing values, outliers, spelling corrections, and more.

![Logo](https://github.com/DataSpoof/AutomatedCleaning/blob/main/images/logo2.png)


## Features
- Supports both large (100+ GB) and small datasets
- Detects and handles missing values and duplicate records
- Identifies and corrects spelling errors in categorical values
- Detect outliers
- Detects and fixes data imbalance
- Identifies and corrects skewness in numerical data
- Checks for correlation and detects multicollinearity
- Analyzes cardinality in categorical columns
- Identifies and cleans text columns
- Detect JSON-type columns
- Performs univariate, bivariate, and multivariate analysis


## Installation
```bash
pip install AutomatedCleaning
```

## Usage
```bash
import AutomatedCleaning as ac
df = ac.load_data("dataset.csv")
df_cleaned = ac.clean_data(df)
```
