Metadata-Version: 2.4
Name: featsynergy
Version: 0.1.0
Summary: Automatic detection of second-order feature synergies for machine learning pipelines.
Author: Maximo
License: MIT
Keywords: machine learning,feature engineering,feature selection,synergy
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21
Requires-Dist: pandas>=1.3
Requires-Dist: scikit-learn>=1.0
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Dynamic: license-file

# featsynergy

**Automatic detection of second-order feature synergies for machine learning pipelines.**

`featsynergy` helps you discover which pairs of features interact meaningfully — and automatically generates the derived features (products, ratios, squares) that capture those interactions.

## Installation
```bash
pip install featsynergy
```

## Quick Start
```python
import pandas as pd
from featsynergy import SynergyDetector

detector = SynergyDetector(top_n=10, gain_thresh=0.002)
detector.fit(X_train, y_train)

# See which pairs have meaningful synergies
print(detector.top_pairs_)

# Add derived features to your DataFrame
X_train_enriched = detector.transform(X_train)
X_test_enriched  = detector.transform(X_test)
```

## How It Works

1. **Feature selection** — selects the top-N most relevant features using a combined score of Pearson correlation and Ridge regression importance.
2. **Pair evaluation** — for each pair of top features, evaluates a Ridge model with and without derived features (product, squares, safe division) using cross-validation.
3. **Synergy detection** — pairs where the derived features meaningfully improve the score (gain > threshold) are flagged as synergistic.
4. **Transform** — adds only the derived features from synergistic pairs to your DataFrame.

## Parameters

| Parameter | Default | Description |
|---|---|---|
| `top_n` | 10 | Number of candidate features to evaluate |
| `gain_thresh` | 0.002 | Minimum gain to consider a pair synergistic |
| `cv` | 3 | Cross-validation folds for pair evaluation |
| `task` | `'regression'` | `'regression'` or `'classification'` |
| `verbose` | `True` | Print progress and results |

## Attributes after `fit()`

| Attribute | Description |
|---|---|
| `top_features_` | Selected top-N features |
| `results_` | Full DataFrame with all evaluated pairs, sorted by gain |
| `top_pairs_` | Pairs that exceed `gain_thresh` |
| `synergy_features_` | Names of derived features added by `transform()` |

## Safe Division

When a feature contains zeros, division is skipped for that direction to avoid infinity values. The reverse division (denominator with no zeros) is still computed normally.

## sklearn Compatible

`SynergyDetector` implements the sklearn `fit` / `transform` / `fit_transform` interface and can be used in pipelines:
```python
from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingRegressor
from featsynergy import SynergyDetector

pipe = Pipeline([
    ('synergy', SynergyDetector(top_n=10)),
    ('model',   GradientBoostingRegressor()),
])
pipe.fit(X_train, y_train)
```

## License

MIT
