Metadata-Version: 2.4
Name: catfishml
Version: 0.4.0
Summary: NeuroSplit Boosting for tabular data with differentiable soft trees and neural gating.
Project-URL: Homepage, https://github.com/catfishml/catfishml
Project-URL: Repository, https://github.com/catfishml/catfishml
Project-URL: Issues, https://github.com/catfishml/catfishml/issues
Author: CatfishML Contributors
License: MIT
License-File: LICENSE
Keywords: gradient-boosting,machine-learning,pytorch,soft-decision-tree,tabular
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Requires-Dist: numpy>=1.23
Requires-Dist: pandas>=1.5
Requires-Dist: scikit-learn>=1.3
Requires-Dist: scipy>=1.10
Requires-Dist: torch>=2.1
Requires-Dist: typing-extensions>=4.8
Provides-Extra: dev
Requires-Dist: mypy>=1.11; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.8; extra == 'viz'
Description-Content-Type: text/markdown

# catfishml

`catfishml` is a Python library for **OmniBoost++**: generalized boosting with adaptive routing across heterogeneous weak learners (linear, spline/GAM-like, adaptive-depth MLP, differentiable soft tree).

## Why catfishml

- Adaptive residual router with complexity/redundancy penalties.
- Additive boosting in natural predictor space with Newton-style targets.
- Numeric + categorical support (categorical embeddings).
- Missing value handling with `SimpleImputer` or MICE-style `IterativeImputer`.
- Adaptive behavior:
  - automatic objective/distribution and metric selection,
  - linearity probe (auto linear vs nonlinear mode),
  - adaptive MLP depth + adaptive tree depth.
- CPU/GPU via PyTorch.
- Automatic dependency install for missing core libraries (can be disabled).

## Install

```bash
pip install catfishml
```

For development:

```bash
pip install -e .[dev]
```

## Quick start

```python
import pandas as pd
from catfishml import FishyCatClassifier

X = pd.DataFrame(
    {
        "age": [25, 31, 45, None, 39, 22, 55],
        "income": [2200, 3400, 7600, 5100, None, 1900, 8800],
        "city": ["A", "B", "A", "C", "B", None, "A"],
    }
)
y = [0, 0, 1, 1, 0, 0, 1]

model = FishyCatClassifier(
    n_estimators=40,
    tree_depth=3,
    metrics="auto",
    auto_metric=True,
    impute_strategy="auto",
    candidate_families="auto",
    install_missing_libraries=True,
    n_jobs=4,
    verbose=1,
)

model.fit(X, y)
print(model.evaluate(X, y))
print(model.predict_proba(X)[:3])
fig = model.plot_visualization(kind="overview")
print(model.get_statistics())
print(model.get_history(as_dataframe=True).head())
```

## Main API

- `FishyCatBooster`
- `FishyCatClassifier`

For regression, use `FishyCatBooster(task="regression", ...)`.

Common parameters:

- `metrics`: metric name (`"auto"`, `"accuracy"`, `"auc"`, `"logloss"`, `"rmse"`, `"mae"`, `"r2"`) or callable.
- `auto_metric`: if `True`, metric and training validation feedback are auto-selected by task/data.
- `impute_strategy`: `"auto"`, `"simple"`, `"iterative"`, or `"none"`.
- `structure_mode`: `"auto"`, `"linear"`, `"nonlinear"`.
- `boosting_order`: `1` (gradient) or `2` (Newton-like weighted residuals).
- `candidate_families`: `"auto"` or subset of `["linear", "spline", "adaptive_mlp", "soft_tree"]`.
- `plot_visualization(kind=...)`: loss/routing/depth/overview diagnostics.
- `get_statistics()`: full training + data summary.
- `get_history(as_dataframe=True)`: per-iteration history (loss, metric, ETA, routing).
- `view_data(X, transformed=True/False)`: inspect raw or transformed data.
- `auto_install_dependencies`: auto-installs missing libs using pip at runtime.
- `install_plot_dependencies`: if `True`, auto-installs plotting dependencies too.
- `full_report(X, y)`: one-shot report (statistics + history + evaluation).
- `available_components()`: list of all integrated learner families/features.

## Notes

- This repository provides a practical implementation of OmniBoost++ ideas; it is not a strict reproduction of a specific paper.
- For larger datasets, run on GPU: `device="cuda"`.

## License

MIT
