Metadata-Version: 2.4
Name: anyml
Version: 0.2.0
Summary: Simple AutoML wrapper for tabular data. One-liner API for classification and regression.
Project-URL: Homepage, https://github.com/vietanhdev/anyml
Project-URL: Documentation, https://github.com/vietanhdev/anyml#readme
Project-URL: Repository, https://github.com/vietanhdev/anyml
Project-URL: Issues, https://github.com/vietanhdev/anyml/issues
Author-email: Viet-Anh Nguyen <vietanh.dev@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: automl,classification,machine-learning,regression,scikit-learn
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Requires-Dist: click>=8.0
Requires-Dist: joblib
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scikit-learn
Provides-Extra: dev
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Provides-Extra: full
Requires-Dist: anyllm; extra == 'full'
Requires-Dist: lightgbm; extra == 'full'
Requires-Dist: xgboost; extra == 'full'
Provides-Extra: lightgbm
Requires-Dist: lightgbm; extra == 'lightgbm'
Provides-Extra: llm
Requires-Dist: anyllm; extra == 'llm'
Provides-Extra: progress
Requires-Dist: tqdm; extra == 'progress'
Provides-Extra: xgboost
Requires-Dist: xgboost; extra == 'xgboost'
Description-Content-Type: text/markdown

# anyml

<p align="center"><img src="logo.svg" alt="anyml logo" width="120"></p>

![PyPI](https://img.shields.io/pypi/v/anyml)
![Python](https://img.shields.io/pypi/pyversions/anyml)
![License](https://img.shields.io/pypi/l/anyml)

**Simple AutoML for tabular data.** One-liner API for classification and regression with automatic preprocessing, model selection, and evaluation.

**Runs completely offline.** All processing uses local scikit-learn and XGBoost models. No cloud APIs or internet connection required.

Built by [Viet-Anh Nguyen](https://github.com/vietanhdev) | [nrl.ai](https://www.nrl.ai)

## Installation

```bash
pip install anyml
```

With XGBoost support:

```bash
pip install "anyml[xgboost]"
```

With all optional dependencies:

```bash
pip install "anyml[full]"
```

## Quick Start

### Classification

```python
import pandas as pd
import anyml

df = pd.read_csv("data.csv")
result = anyml.classify(df, target="label")

print(result.score)         # Best cross-validation accuracy
print(result.model_name)    # e.g. "random_forest"
print(result.report())      # Full classification report
```

### Regression

```python
result = anyml.regress(df, target="price")

print(result.score)         # Best cross-validation RMSE (negative)
print(result.report())      # RMSE, MAE, R2
```

### Predict on New Data

```python
predictions = result.predict(new_df)
```

### Feature Importance

```python
result.explain()
# Returns a DataFrame with feature names and importance scores
```

### Compare Models

```python
scores = anyml.compare(df, target="label")
# {'logistic_regression': 0.92, 'random_forest': 0.95, ...}
```

### Choose Specific Models

```python
result = anyml.classify(df, target="label", models=["xgboost", "random_forest"])
```

### Standalone Preprocessing

```python
processed_df = anyml.preprocess(df)
# or with target separation:
X, y = anyml.preprocess(df, target="label")
```

### Save and Load

```python
result.save("model.joblib")
loaded = anyml.load_model("model.joblib")
loaded.predict(new_df)
```

## How It Works

### Automatic Preprocessing

anyml automatically detects column types and applies appropriate transformations:

| Data Type | Handling |
|-----------|----------|
| Numeric | Median imputation + StandardScaler |
| Categorical (low cardinality) | Mode imputation + OneHotEncoding |
| Categorical (high cardinality) | Mode imputation + OneHotEncoding |
| Datetime | Extract year, month, day, day-of-week |
| Missing values | Automatic imputation (median for numeric, mode for categorical) |

### Model Selection

anyml tries multiple models and selects the best one via cross-validation:

**Classification:**
- Logistic Regression
- Random Forest
- XGBoost (optional)
- LightGBM (optional)

**Regression:**
- Linear Regression
- Random Forest
- XGBoost (optional)
- LightGBM (optional)

### Evaluation Metrics

**Classification:** Accuracy, F1 (weighted), F1 (macro), full classification report

**Regression:** RMSE, MAE, R2

## Comparison with Other AutoML Tools

| Feature | anyml | auto-sklearn | TPOT | H2O |
|---------|-------|--------------|------|-----|
| One-liner API | Yes | No | No | No |
| No config needed | Yes | Partial | Partial | No |
| Lightweight | Yes | No | No | No |
| Preprocessing included | Yes | Yes | Yes | Yes |
| Explainability | Yes | No | No | Partial |
| Pure Python | Yes | Yes | Yes | No (Java) |

anyml is designed for simplicity. If you need extensive hyperparameter tuning or neural architecture search, consider auto-sklearn or TPOT. If you want to go from data to predictions in one line, anyml is for you.

## API Reference

### `anyml.classify(df, target, models=None, cv=5, scoring=None)`

Auto-classify a tabular dataset. Returns an `AutoResult`.

### `anyml.regress(df, target, models=None, cv=5, scoring=None)`

Auto-regress a tabular dataset. Returns an `AutoResult`.

### `anyml.compare(df, target, task=None, models=None, cv=5, scoring=None)`

Compare multiple models. Returns a dict of `{model_name: cv_score}`.

### `anyml.preprocess(df, target=None)`

Standalone preprocessing. Returns processed DataFrame (or `(X, y)` tuple if target given).

### `AutoResult`

| Attribute / Method | Description |
|---|---|
| `.model` | Fitted sklearn estimator |
| `.score` | Best mean CV score |
| `.model_name` | Name of the winning model |
| `.metrics` | Evaluation metrics dict |
| `.feature_importances` | Feature importance dict |
| `.all_scores` | CV scores for all models tried |
| `.predict(df)` | Predict on new data |
| `.explain()` | Feature importance DataFrame |
| `.report()` | Human-readable report string |
| `.save(path)` | Save to disk with joblib |

## Local-First / Edge AI

This package is designed to work completely offline. All model training and
inference uses local libraries (scikit-learn, XGBoost, LightGBM). No internet
connection or cloud APIs are required.

## Development

```bash
git clone https://github.com/vietanhdev/anyml.git
cd anyml
pip install -e ".[dev]"
pytest tests/ -v
```

## License

MIT License. See [LICENSE](LICENSE) for details.
