Metadata-Version: 2.4
Name: cartoboost
Version: 0.1.19
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: numpy>=1.23
Requires-Dist: scikit-learn>=1.2
Requires-Dist: shap>=0.49.1,<0.50 ; extra == 'explain'
Requires-Dist: onnx>=1.16 ; extra == 'onnx'
Requires-Dist: optuna>=4.0 ; extra == 'optuna'
Requires-Dist: polars>=1.0 ; extra == 'polars'
Provides-Extra: explain
Provides-Extra: onnx
Provides-Extra: optuna
Provides-Extra: polars
License-File: LICENSE
Summary: Clean-room CartoBoost-inspired regression package.
Keywords: boosting,gradient-boosting,machine-learning,python,rust
Author: Ryan Culligan
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/theculliganman/cartoboost
Project-URL: Issues, https://github.com/theculliganman/cartoboost/issues
Project-URL: Repository, https://github.com/theculliganman/cartoboost

# CartoBoost

[![PyPI](https://img.shields.io/pypi/v/cartoboost.svg)](https://pypi.org/project/cartoboost/)
[![Python](https://img.shields.io/pypi/pyversions/cartoboost.svg)](https://pypi.org/project/cartoboost/)
[![CI](https://github.com/TheCulliganMan/CartoBoost/actions/workflows/ci.yml/badge.svg)](https://github.com/TheCulliganMan/CartoBoost/actions/workflows/ci.yml)
[![Docs](https://github.com/TheCulliganMan/CartoBoost/actions/workflows/pages.yml/badge.svg)](https://github.com/TheCulliganMan/CartoBoost/actions/workflows/pages.yml)
[![Publish](https://github.com/TheCulliganMan/CartoBoost/actions/workflows/publish-pypi.yml/badge.svg)](https://github.com/TheCulliganMan/CartoBoost/actions/workflows/publish-pypi.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

CartoBoost is a Python regression toolkit for temporal, spatial, geotemporal,
and graph-derived prediction problems. It keeps the estimator workflow familiar
to scikit-learn users while adding modeling primitives for place, time, sparse
route membership, source-target directionality, and learned graph context.

Use CartoBoost when a standard tabular booster is a strong baseline, but your
problem still requires hand-built features to represent:

- wraparound time such as hour-of-day, weekday, or seasonal cycles;
- 2D spatial boundaries, corridors, depots, hotspots, and service regions;
- list-valued memberships such as route cells, zones, markets, or H3 cells;
- source-target movement such as origin-to-destination flows;
- high-cardinality IDs that benefit from learned embeddings.

## Core Capabilities

CartoBoost supports:

- L2 and quantile regression objectives.
- Constant and linear residual leaves.
- Axis, histogram-axis, diagonal 2D, Gaussian/radial 2D, periodic, sparse-set,
  and fuzzy split behavior.
- Dense numeric arrays plus list-valued sparse-set features.
- Feature schemas for numeric, periodic, sparse-set, and model-contract
  validation.
- JSON model artifacts and portable weights artifacts.
- Optional SHAP explanations, Optuna tuning, Polars input support, and ONNX
  export for the supported dense axis-tree subset.
- Neural embedding features for high-cardinality IDs.
- node2vec, GraphSAGE, heterogeneous GraphSAGE, and typed-schema HinSAGE graph
  feature encoders.

## Install

Install the released package from PyPI:

```sh
uv add cartoboost
```

Optional integrations:

```sh
uv add "cartoboost[explain]"  # SHAP support
uv add "cartoboost[optuna]"   # Optuna tuning
uv add "cartoboost[polars]"   # Polars inputs
uv add "cartoboost[onnx]"     # ONNX export subset
```

Verify the install:

```sh
python -c "import cartoboost; print(cartoboost.__version__)"
cartoboost --help
```

## Basic Regression

```python
from cartoboost import CartoBoostRegressor

model = CartoBoostRegressor(
    n_estimators=100,
    learning_rate=0.05,
    max_depth=4,
    min_samples_leaf=20,
    splitters=["axis"],
)

model.fit(X_train, y_train)
predictions = model.predict(X_test)
```

The estimator supports sklearn-style `get_params`, `set_params`, `clone`,
`Pipeline`, `GridSearchCV`, and NumPy-array predictions.

## Temporal-Spatial Modeling

Use dense columns for numeric location and time features, and sparse-set columns
for memberships such as route cells, zones, markets, or encoded H3 cells.

```python
from cartoboost import CartoBoostRegressor

schema = {
    "dense": [
        {"name": "pickup_x", "kind": "numeric"},
        {"name": "pickup_y", "kind": "numeric"},
        {"name": "hour_of_day", "kind": "periodic", "period": 24},
        {"name": "trip_distance", "kind": "numeric"},
    ],
    "sparse_sets": [
        {"name": "route_cells", "kind": "sparse_set"},
    ],
}

model = CartoBoostRegressor(
    n_estimators=200,
    learning_rate=0.04,
    max_depth=5,
    min_samples_leaf=30,
    splitters=["axis", "diagonal_2d", "gaussian_2d", "periodic:24", "sparse_set"],
    fuzzy=True,
    fuzzy_bandwidth=0.05,
)

model.fit(
    X_train_dense,
    y_train,
    sparse_sets={"route_cells": route_cells_train},
    feature_schema=schema,
)

predictions = model.predict(
    X_test_dense,
    sparse_sets={"route_cells": route_cells_test},
)
```

Why this helps:

- `periodic:24` treats midnight-adjacent hours as neighbors.
- `diagonal_2d` learns oblique spatial boundaries more directly than axis-only
  trees.
- `gaussian_2d` isolates radial neighborhoods around local hotspots.
- `sparse_set` splits on list-valued route or cell membership without a wide
  one-hot matrix.
- `fuzzy=True` reduces hard jumps near spatial or temporal boundaries.

## Graph Features

CartoBoost can precompute graph-derived columns before booster training.
Supported encoder families are node2vec, GraphSAGE, HeteroGraphSAGE, and
HinSAGE. Direction is a first-class contract: `A -> B` and `B -> A` can be
separate facts, features, and embeddings.

See [Graph Features](docs/graph-features.md) for encoder configs, directional
features, OD-pair nodes, metapaths, artifacts, and benchmark guidance.

## Neural Embedding Hybrid

Use `NeuralEmbeddingRegressor` when high-cardinality IDs carry stable signal and
you want learned dense embeddings appended to the tree input.

```python
from cartoboost import NeuralEmbeddingRegressor

model = NeuralEmbeddingRegressor(
    dim=16,
    use_residual=True,
    base_model_kwargs={"n_estimators": 80, "splitters": ["axis"]},
    final_model_kwargs={"n_estimators": 120, "splitters": ["axis", "periodic:24"]},
)

model.fit(X_train, y_train, ids=ids_train)
predictions = model.predict(X_test, ids=ids_test)
```

For a quick head-to-head comparison on one split:

```python
from cartoboost import benchmark_neural_vs_cartoboost

results = benchmark_neural_vs_cartoboost(X, y, ids, split_ratio=0.8)
```

Use this helper as an initial signal check, then validate with your real
temporal, spatial, grouped, or out-of-time split.

## Save, Load, And Explain

```python
model.save("model.cartoboost.json")
loaded = CartoBoostRegressor.load("model.cartoboost.json")

explanation = loaded.explain_shap(
    X_test_dense,
    background=X_train_dense,
    sparse_sets={"route_cells": route_cells_test},
    background_sparse_sets={"route_cells": route_cells_train},
)
```

Model artifacts are versioned JSON and include optional metadata, feature
schema, and training configuration fields. Graph and neural feature artifacts
should be persisted alongside the booster when features are precomputed offline.

## CLI

The CLI supports dense numeric CSV train, predict, eval, and inspect workflows.
Use the Python API for list-valued sparse route-cell features and graph-derived
feature pipelines.

```sh
cartoboost train --data train.csv --config configs/regression.toml --model-out model.json
cartoboost predict --model model.json --input test.csv --predictions-out predictions.csv
cartoboost eval --model model.json --data test_with_target.csv
```

## Documentation

- [Documentation Home](docs/index.md)
- [Installation](docs/installation.md)
- [Getting Started](docs/getting-started.md)
- [Python Estimator](docs/user-guide/python-estimator.md)
- [Parameters](docs/user-guide/parameters.md)
- [Spatial Modeling](docs/spatial_modeling.md)
- [Graph Features](docs/graph-features.md)
- [Neural Features](docs/neural-features.md)
- [Evaluation Protocol](docs/evaluation_protocol.md)
- [Feature Schema](docs/feature_schema.md)
- [Sparse Features](docs/sparse_features.md)
- [Model Artifacts](docs/model_artifact.md)
- [Python API Reference](docs/reference/python-api.md)
- [CLI Reference](docs/reference/cli.md)
- [Benchmarks](docs/benchmarks/index.md)
- [Limitations](docs/limitations.md)

