Metadata-Version: 2.4
Name: tableai
Version: 0.2.2
Summary: AI toolkit for tabular data — auto EDA, data profiling, anomaly detection, and smart transformations on DataFrames.
Project-URL: Homepage, https://www.nrl.ai
Project-URL: Repository, https://github.com/vietanhdev/tableai
Project-URL: Issues, https://github.com/vietanhdev/tableai/issues
Author-email: Viet-Anh Nguyen <vietanh.dev@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: anomaly-detection,data-cleaning,data-profiling,eda,pandas,tabular-data
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.8
Requires-Dist: click>=8.0
Requires-Dist: numpy
Requires-Dist: pandas
Provides-Extra: all
Requires-Dist: anyllm; extra == 'all'
Requires-Dist: matplotlib>=3.5; extra == 'all'
Requires-Dist: pytest-cov>=4.0; extra == 'all'
Requires-Dist: pytest>=7.0; extra == 'all'
Requires-Dist: scikit-learn>=1.0; extra == 'all'
Requires-Dist: seaborn>=0.12; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: llm
Requires-Dist: anyllm; extra == 'llm'
Provides-Extra: ml
Requires-Dist: scikit-learn>=1.0; extra == 'ml'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.5; extra == 'viz'
Requires-Dist: seaborn>=0.12; extra == 'viz'
Description-Content-Type: text/markdown

<h1 align="center">tableai</h1>
<p align="center"><em>AI toolkit for your DataFrames</em></p>

![PyPI](https://img.shields.io/pypi/v/tableai)
![Python](https://img.shields.io/pypi/pyversions/tableai)
![License](https://img.shields.io/pypi/l/tableai)

**AI toolkit for tabular data** -- auto EDA, data profiling, anomaly detection, and smart transformations on DataFrames. The pandas AI companion.

**Runs completely offline.** All processing uses local pandas/numpy/scikit-learn. No cloud APIs or internet connection required.

Built by [Viet-Anh Nguyen](https://www.nrl.ai) | [GitHub](https://github.com/vietanhdev/tableai)

---

## Installation

```bash
pip install tableai
```

With optional dependencies:

```bash
pip install tableai[ml]    # scikit-learn for Isolation Forest anomaly detection
pip install tableai[viz]   # matplotlib + seaborn for visualizations
pip install tableai[all]   # everything
```

## Quick Start

```python
import pandas as pd
import tableai

df = pd.read_csv("your_data.csv")

# Computes per-column statistics: dtypes, null counts, unique values,
# numeric stats (mean/median/std/quartiles/skewness), correlations
# Returns DataProfile with rich notebook display
report = tableai.profile(df)
print(report)
```

**Output:**

```
=== Data Profile ===
Shape: (1000, 8)
Memory usage: 62.6 KB

Column Profiles:
  age (int64)
    Non-null: 950 / 1000 (95.0%)  |  Unique: 60
    Mean: 35.2  |  Std: 12.1  |  Min: 18.0  |  Max: 75.0

  city (object)
    Non-null: 980 / 1000 (98.0%)  |  Unique: 25
    Top values: New York (120), London (98), Tokyo (85)
...
```

### Auto-Clean

```python
# Auto-cleaning: fills nulls (median/mode), removes duplicates,
# clips outliers via IQR method
cleaned_df = tableai.clean(df)
```

### Anomaly Detection

```python
result = tableai.anomalies(df)
print(result)
```

**Output:**

```
=== Anomaly Report ===
Method: iqr
Total rows flagged: 42 / 1000 (4.2%)

  salary: 18 anomalies (min_bound=20000, max_bound=150000)
  age: 7 anomalies (min_bound=5, max_bound=80)
...
```

### Auto Insights

```python
insights = tableai.insights(df)
for insight in insights:
    print(f"- {insight}")
```

**Output:**

```
- Column 'age' has 5.0% missing values
- Strong positive correlation (0.87) between 'hours_worked' and 'salary'
- Column 'income' is highly skewed (skewness=2.31)
- Column 'city' has high cardinality (250 unique values out of 1000 rows)
```

### DataFrame Comparison

```python
diff = tableai.compare(df_old, df_new)
print(diff)
```

### Smart Transformations

```python
transformed_df = tableai.transform(df)
# Auto-encodes categoricals, scales numerics, extracts datetime features
```

## Comparison with Alternatives

| Feature | tableai | pandas-profiling | sweetviz |
|---|---|---|---|
| Zero-config profiling | Yes | Yes | Yes |
| Auto-cleaning | Yes | No | No |
| Anomaly detection | Yes | No | No |
| Smart transforms | Yes | No | No |
| Auto insights (text) | Yes | No | No |
| DataFrame comparison | Yes | No | Yes |
| Lightweight (no heavy deps) | Yes | No | No |
| Notebook-friendly output | Yes | Yes | Yes |

## API Reference

- `tableai.profile(df)` -- Comprehensive data profile
- `tableai.clean(df, **kwargs)` -- Auto-clean DataFrame
- `tableai.anomalies(df, method="iqr")` -- Detect anomalies
- `tableai.transform(df, **kwargs)` -- Smart feature engineering
- `tableai.insights(df)` -- Auto-generate text insights
- `tableai.compare(df1, df2)` -- Compare two DataFrames

## Local-First / Edge AI

This package is designed to work completely offline. All data profiling,
cleaning, anomaly detection, and transformations run locally using
pandas, numpy, and scikit-learn. No internet connection or cloud APIs
are required.

## License

MIT License. See [LICENSE](LICENSE) for details.

## Links

- **Website:** [nrl.ai](https://www.nrl.ai)
- **Repository:** [github.com/vietanhdev/tableai](https://github.com/vietanhdev/tableai)
- **Author:** [Viet-Anh Nguyen](https://github.com/vietanhdev)
