Metadata-Version: 2.4
Name: aillmcleaner
Version: 0.2.1
Summary: An AI-powered Python library for context-aware data cleaning using local LLMs
Home-page: https://github.com/spanigrahidev/aillmcleaner
Author: Sujoy Panigrahi
Author-email: sujoypanigrahi4@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5.0
Requires-Dist: requests>=2.28.0
Requires-Dist: google-genai>=0.1.1
Requires-Dist: openai>=1.0.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# AILLMCleaner

An AI-powered Python library for intelligent, context-aware automated data cleaning using both Local LLMs (Ollama) and Cloud AI APIs (Google Gemini, OpenAI, Groq).

---

## What Does It Do?

Real-world datasets are messy — names in wrong case, duplicate city names, missing values, invalid emails. aillmcleaner uses Large Language Models to understand and fix your data intelligently, not just with simple rules.

---

## Requirements

- Python 3.8 or higher
- pandas
- requests
- openai
- google-genai
- One AI provider (Ollama, Gemini, OpenAI, or Groq)

Install all dependencies:

```
pip install aillmcleaner
```

---

## AI Provider Options

Ollama — Free, runs locally on your computer. No API key needed.
Download from: https://ollama.com

Google Gemini — Free tier available.
Get API key from: https://aistudio.google.com

OpenAI (ChatGPT) — Paid service.
Get API key from: https://platform.openai.com

Groq — Free tier available, very fast.
Get API key from: https://console.groq.com

---

## Installation

```
pip install aillmcleaner
```

---

## Quick Start

### Using Ollama (Free, No API Key Needed)

First start Ollama on your computer:

```
ollama serve
ollama pull llama3.2
```

Then use in Python:

```python
import pandas as pd
from aillmcleaner import clean_column, standardize_column, fill_missing, detect_anomalies

data = {
    "name":    ["alice johnson", "BOB SMITH", "Charlie  Brown"],
    "city":    ["new york", "newyork", "LA"],
    "country": [None, "USA", "United States"]
}

df = pd.DataFrame(data)

# Fix name casing
df["name"] = clean_column(df, "name",
    instruction="Convert to proper title case. Return only the name.")

# Standardize city names
df["city"] = standardize_column(df, "city",
    categories=["New York", "Los Angeles", "Chicago"])

# Fill missing country values
df["country"] = fill_missing(df, "country", context_columns=["city"])

print(df)
```

---

### Using Google Gemini

```python
from aillmcleaner import clean_column

df["name"] = clean_column(df, "name",
    provider="gemini",
    api_key="YOUR_GEMINI_API_KEY")
```

Or set as environment variable:

```
export GEMINI_API_KEY="your_key_here"
```

---

### Using OpenAI

```python
from aillmcleaner import clean_column

df["name"] = clean_column(df, "name",
    provider="openai",
    model="gpt-4o-mini",
    api_key="YOUR_OPENAI_API_KEY")
```

---

### Using Groq

```python
from aillmcleaner import clean_column

df["name"] = clean_column(df, "name",
    provider="groq",
    model="llama3-8b-8192",
    api_key="YOUR_GROQ_API_KEY")
```

---

## All Available Functions

clean_text — Clean a single text value.

```python
from aillmcleaner import clean_text
result = clean_text("helo wrold", provider="gemini", api_key="YOUR_KEY")
```

clean_column — Clean an entire DataFrame column.

```python
df["name"] = clean_column(df, "name",
    instruction="Fix spelling and capitalize properly.")
```

standardize_column — Map column values to a fixed list of categories.

```python
df["city"] = standardize_column(df, "city",
    categories=["New York", "Los Angeles", "Chicago"])
```

fill_missing — Fill missing values using context from other columns.

```python
df["country"] = fill_missing(df, "country",
    context_columns=["city", "state"])
```

detect_anomalies — Find suspicious or invalid values in a column.

```python
bad_values = detect_anomalies(df, "email")
print(bad_values)
```

clean_dataframe — Clean all text columns in a DataFrame at once.

```python
from aillmcleaner import clean_dataframe
cleaned_df = clean_dataframe(df, provider="groq", api_key="YOUR_KEY")
```

---

## Provider and Model Options

provider can be: ollama (default), gemini, openai, groq
model is optional — uses the best default for each provider
api_key is required only for cloud providers (gemini, openai, groq)

---

## License

MIT License — free to use, modify, and distribute.

---

## Author

Sujoy Panigrahi (spanigrahidev)
GitHub: https://github.com/spanigrahidev
PyPI: https://pypi.org/project/aillmcleaner/

