Metadata-Version: 2.4
Name: ml4t-engineer
Version: 0.1.0a8
Summary: High-performance quantitative finance feature engineering library
Project-URL: Homepage, https://github.com/stefan-jansen/ml4t-engineer
Project-URL: Documentation, https://ml4t-engineer.readthedocs.io
Project-URL: Repository, https://github.com/stefan-jansen/ml4t-engineer
Project-URL: Issues, https://github.com/stefan-jansen/ml4t-engineer/issues
Project-URL: Changelog, https://github.com/stefan-jansen/ml4t-engineer/blob/main/CHANGELOG.md
Author-email: Stefan Jansen <stefan@ml4trading.io>
Maintainer-email: Stefan Jansen <stefan@ml4trading.io>
License: MIT
Keywords: backtesting,feature-engineering,features,finance,labeling,machine-learning,microstructure,polars,quantitative-finance,technical-analysis,trading
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business :: Financial :: Investment
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: matplotlib>=3.7.0
Requires-Dist: numba>=0.57.0
Requires-Dist: numpy<2.3,>=1.24.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: polars>=0.20.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dateutil>=2.8.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: scipy>=1.10.0
Requires-Dist: statsmodels>=0.14.0
Requires-Dist: structlog>=23.0.0
Provides-Extra: all
Requires-Dist: arch>=6.0.0; extra == 'all'
Requires-Dist: duckdb>=0.9.0; extra == 'all'
Requires-Dist: hypothesis>=6.80.0; extra == 'all'
Requires-Dist: ipdb>=0.13.0; extra == 'all'
Requires-Dist: ipython>=8.14.0; extra == 'all'
Requires-Dist: lightgbm>=4.0.0; extra == 'all'
Requires-Dist: matplotlib>=3.7.0; extra == 'all'
Requires-Dist: mypy>=1.5.0; extra == 'all'
Requires-Dist: myst-parser>=2.0.0; extra == 'all'
Requires-Dist: nbsphinx>=0.9.0; extra == 'all'
Requires-Dist: pandas-market-calendars>=4.0.0; extra == 'all'
Requires-Dist: plotly>=5.15.0; extra == 'all'
Requires-Dist: pre-commit>=3.3.0; extra == 'all'
Requires-Dist: pytest-benchmark>=4.0.0; extra == 'all'
Requires-Dist: pytest-cov>=4.1.0; extra == 'all'
Requires-Dist: pytest-timeout>=2.1.0; extra == 'all'
Requires-Dist: pytest-xdist>=3.3.0; extra == 'all'
Requires-Dist: pytest>=7.4.0; extra == 'all'
Requires-Dist: ruff>=0.1.0; extra == 'all'
Requires-Dist: seaborn>=0.12.0; extra == 'all'
Requires-Dist: shap>=0.44.0; extra == 'all'
Requires-Dist: sphinx-autodoc-typehints>=1.24.0; extra == 'all'
Requires-Dist: sphinx-rtd-theme>=1.3.0; extra == 'all'
Requires-Dist: sphinx>=7.0.0; extra == 'all'
Requires-Dist: ta-lib>=0.4.0; extra == 'all'
Provides-Extra: calendars
Requires-Dist: pandas-market-calendars>=4.0.0; extra == 'calendars'
Provides-Extra: dev
Requires-Dist: arch>=6.0.0; extra == 'dev'
Requires-Dist: duckdb>=0.9.0; extra == 'dev'
Requires-Dist: hypothesis>=6.80.0; extra == 'dev'
Requires-Dist: ipdb>=0.13.0; extra == 'dev'
Requires-Dist: ipython>=8.14.0; extra == 'dev'
Requires-Dist: lightgbm>=4.0.0; extra == 'dev'
Requires-Dist: mypy>=1.5.0; extra == 'dev'
Requires-Dist: pandas-market-calendars>=4.0.0; extra == 'dev'
Requires-Dist: pre-commit>=3.3.0; extra == 'dev'
Requires-Dist: pytest-benchmark>=4.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest-timeout>=2.1.0; extra == 'dev'
Requires-Dist: pytest-xdist>=3.3.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: shap>=0.44.0; extra == 'dev'
Requires-Dist: ta-lib>=0.4.0; extra == 'dev'
Requires-Dist: yfinance>=0.2.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: myst-parser>=2.0.0; extra == 'docs'
Requires-Dist: nbsphinx>=0.9.0; extra == 'docs'
Requires-Dist: sphinx-autodoc-typehints>=1.24.0; extra == 'docs'
Requires-Dist: sphinx-rtd-theme>=1.3.0; extra == 'docs'
Requires-Dist: sphinx>=7.0.0; extra == 'docs'
Provides-Extra: ml
Requires-Dist: lightgbm>=4.0.0; extra == 'ml'
Requires-Dist: shap>=0.44.0; extra == 'ml'
Provides-Extra: store
Requires-Dist: duckdb>=0.9.0; extra == 'store'
Provides-Extra: ta
Requires-Dist: ta-lib>=0.4.0; extra == 'ta'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.7.0; extra == 'viz'
Requires-Dist: plotly>=5.15.0; extra == 'viz'
Requires-Dist: seaborn>=0.12.0; extra == 'viz'
Description-Content-Type: text/markdown

# ml4t-engineer

[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![PyPI](https://img.shields.io/pypi/v/ml4t-engineer)](https://pypi.org/project/ml4t-engineer/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Feature engineering for financial machine learning: technical indicators, labeling methods, and alternative bar sampling.

## Part of the ML4T Library Ecosystem

This library is one of five interconnected libraries supporting the machine learning for trading workflow described in [Machine Learning for Trading](https://mlfortrading.io):

![ML4T Library Ecosystem](docs/images/ml4t_ecosystem_workflow_print.jpeg)

Each library addresses a distinct stage: data infrastructure, feature engineering, signal evaluation, strategy backtesting, and live deployment.

## What This Library Does

Transforming raw price data into predictive features is a core task in quantitative research. ml4t-engineer provides:

- 120 technical indicators across 11 categories (momentum, volatility, trend, microstructure, etc.)
- Triple-barrier labeling and other target construction methods from *Advances in Financial Machine Learning*
- Alternative bar sampling (volume bars, dollar bars, tick imbalance bars)
- A feature registry for discovery and configuration

The library is built on Polars with Numba JIT compilation for numerical operations. 59 indicators are validated against TA-Lib at 1e-6 tolerance.

![ml4t-engineer Architecture](docs/images/ml4t_engineer_architecture_print.jpeg)

## Installation

```bash
pip install ml4t-engineer
```

Optional dependencies:

```bash
pip install ml4t-engineer[ta]        # TA-Lib backend
pip install ml4t-engineer[viz]       # Visualization
pip install ml4t-engineer[calendars] # Trading calendars
```

## Quick Start

```python
import polars as pl
from ml4t.engineer import compute_features

df = pl.read_parquet("ohlcv.parquet")

# Compute features with default parameters
result = compute_features(df, ["rsi", "macd", "atr", "obv"])

# Or with custom parameters
result = compute_features(df, [
    {"name": "rsi", "params": {"period": 20}},
    {"name": "bollinger_bands", "params": {"period": 20, "std_dev": 2.0}},
])
```

## Feature Registry

```python
from ml4t.engineer.core.registry import get_registry

registry = get_registry()
print(registry.list_all())                    # All 120 features
print(registry.list_by_category("momentum"))  # 31 momentum indicators
print(registry.list_ta_lib_compatible())      # 59 TA-Lib validated
print(registry.list_normalized())             # 37 bounded (0-100, -1 to 1)
```

## Feature Categories

| Category | Count | Examples |
|----------|-------|----------|
| Momentum | 31 | RSI, MACD, Stochastic, CCI, ADX, MFI |
| Microstructure | 15 | Kyle Lambda, VPIN, Amihud, Roll spread |
| Volatility | 15 | ATR, Bollinger, Yang-Zhang, Parkinson |
| Statistics | 14 | Variance, Linear Regression, Correlation |
| ML | 14 | Fractional Diff, Entropy, Lag features |
| Trend | 10 | SMA, EMA, WMA, DEMA, TEMA, KAMA |
| Risk | 6 | Max Drawdown, Sortino, CVaR |
| Price Transform | 5 | Typical Price, Weighted Close |
| Regime | 4 | Hurst Exponent, Choppiness Index |
| Volume | 3 | OBV, AD, ADOSC |
| Math | 3 | MAX, MIN, SUM |

## Triple-Barrier Labeling

```python
from ml4t.engineer.config import LabelingConfig
from ml4t.engineer.labeling import triple_barrier_labels, atr_triple_barrier_labels

# Fixed barriers
tb_config = LabelingConfig.triple_barrier(
    upper_barrier=0.02,    # 2% profit target
    lower_barrier=0.01,    # 1% stop loss
    max_holding_period=20, # 20 bars
)
labels = triple_barrier_labels(
    df,
    config=tb_config,
)

# ATR-based dynamic barriers
atr_config = LabelingConfig.atr_barrier(
    atr_tp_multiple=2.0,
    atr_sl_multiple=1.0,
    atr_period=14,
    max_holding_period=20,
)
labels = atr_triple_barrier_labels(
    df,
    config=atr_config,
)

# Time-based horizons
tb_time_config = LabelingConfig.triple_barrier(
    upper_barrier=0.02,
    lower_barrier=0.01,
    max_holding_period="4h",  # 4 hours
)
labels = triple_barrier_labels(
    df,
    config=tb_time_config,
)
```

## Alternative Bars

```python
from ml4t.engineer.bars import VolumeBarSampler, DollarBarSampler, TickImbalanceBarSampler

# Volume bars (equal volume per bar)
vbars = VolumeBarSampler(volume_threshold=1000).sample(tick_data)

# Dollar bars (equal dollar volume per bar)
dbars = DollarBarSampler(dollar_threshold=1_000_000).sample(tick_data)

# Tick imbalance bars (information-driven)
ibars = TickImbalanceBarSampler(expected_imbalance=100).sample(tick_data)
```

## Technical Characteristics

- **Polars-native**: All computations use Polars expressions
- **Numba-accelerated**: JIT compilation for numerical kernels
- **TA-Lib validated**: 59 indicators validated at 1e-6 tolerance
- **AFML-compliant**: Labeling methods verified against *Advances in Financial Machine Learning*
- **ML-ready outputs**: 37 features produce bounded outputs (0-100, -1 to 1) for direct model input; remaining features work with standard preprocessing (returns, z-scores, robust scaling)

## Related Libraries

- **ml4t-data**: Market data acquisition and storage
- **ml4t-diagnostic**: Signal evaluation and statistical validation
- **ml4t-backtest**: Event-driven backtesting
- **ml4t-live**: Live trading with broker integration

## Development

```bash
git clone https://github.com/applied-ai/ml4t-engineer.git
cd ml4t-engineer
uv sync
uv run pytest tests/ -q
uv run ty check
```

## References

- Lopez de Prado, M. (2018). *Advances in Financial Machine Learning*. Wiley.
- Lopez de Prado, M. (2020). *Machine Learning for Asset Managers*. Cambridge.

## License

MIT License - see [LICENSE](LICENSE) for details.
