Metadata-Version: 2.4
Name: deflated-sharpe
Version: 0.1.0
Summary: Deflated Sharpe Ratio and statistical gates for quantitative strategy validation
Project-URL: Homepage, https://github.com/mnemox-ai/deflated-sharpe
Project-URL: Repository, https://github.com/mnemox-ai/deflated-sharpe
Author-email: Mnemox AI <dev@mnemox.ai>
License-Expression: Apache-2.0
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business :: Financial :: Investment
Requires-Python: >=3.10
Provides-Extra: scipy
Requires-Dist: scipy>=1.10; extra == 'scipy'
Description-Content-Type: text/markdown

# deflated-sharpe

![Is your backtest real?](assets/hero.png)

[![PyPI](https://img.shields.io/pypi/v/deflated-sharpe)](https://pypi.org/project/deflated-sharpe/)
![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue)
![License](https://img.shields.io/badge/license-Apache_2.0-green)
![Tests](https://img.shields.io/badge/tests-27_passing-brightgreen)

You tested 1,000 parameter combinations and found a Sharpe 2.0 strategy. Is it real — or did you just test enough times to get lucky?

`deflated-sharpe` implements the Deflated Sharpe Ratio (Bailey & Lopez de Prado, 2014) and related statistical gates. Pure Python, zero required dependencies, designed to be the last check before you deploy a strategy.

## Install

```bash
pip install deflated-sharpe
```

## Quick Start

### Deflated Sharpe Ratio

```python
from deflated_sharpe import deflated_sharpe_ratio

dsr, p_value = deflated_sharpe_ratio(observed_sr=2.0, num_trials=1000, num_obs=252)
print(f"DSR={dsr:.2f}, p={p_value:.4f}")
```

A positive DSR with p < 0.05 means your strategy likely has real alpha after accounting for the number of trials you ran. A negative DSR means the expected maximum Sharpe from random chance alone exceeds your observed Sharpe.

### Minimum Backtest Length

```python
from deflated_sharpe import min_backtest_length

min_obs = min_backtest_length(target_sr=1.5, num_trials=500)
print(f"Need at least {min_obs} observations")
```

Before running a large search, compute how many observations you need. If your backtest window is shorter than `min_obs`, no strategy can pass the DSR gate regardless of performance.

### Regime Decay Detection

```python
from deflated_sharpe import RegimeDecayDetector, StrategyBaseline, TradeResult

baseline = StrategyBaseline(win_rate=0.55, trade_count=200, max_drawdown_pct=12.0)
detector = RegimeDecayDetector(baseline=baseline)
detector.fit_market_baseline(training_features)  # list of (atr_ratio, trend_pct, atr_percentile)

for trade in live_trades:
    detector.add_trade(trade)
assessment = detector.assess()
print(f"Decay confirmed: {assessment.decay_confirmed} ({assessment.signals_fired}/3 signals)")
```

Triple-confirmation system: Bayesian win rate decay, drawdown exceedance (1.5x backtest MDD), and Mahalanobis out-of-distribution detection. Two of three signals must fire simultaneously to confirm decay.

## How DSR Saved Us

> In March 2026, we ran a grid search over 19,200 parameter combinations on BTCUSDT 1H
> walk-forward data (23 periods, 3-month IS + 3-month OOS). Multiple strategies showed
> Sharpe ratios above 1.5 in-sample. The DSR gate rejected every single one — correctly
> preventing deployment of overfitted strategies.
>
> The math was simple: with M=19,200 trials and only 30-50 trades per window, the expected
> maximum Sharpe from pure chance exceeded every observed value. We then tested LLM-guided
> search (M~30 per period, 640x fewer trials) and found the same result: trade count was
> the binding constraint, not search method. DSR saved us from deploying strategies that
> looked profitable but had zero statistical significance.
>
> Full analysis: [Phase 15 Case Study](examples/phase15_case_study.py)

![Before and After DSR](assets/before_after.png)

## Tools

### `deflated_sharpe_ratio(observed_sr, num_trials, num_obs, skewness, kurtosis)`

Computes the Deflated Sharpe Ratio per Bailey & Lopez de Prado (2014). Adjusts the observed
Sharpe for selection bias from multiple testing, accounting for return non-normality via
skewness and kurtosis corrections.

The key insight: when you test M strategies, the maximum Sharpe you expect from pure luck
grows as `O(sqrt(ln(M)))`. DSR subtracts this expected maximum from your observed Sharpe
and normalizes by the standard error.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `observed_sr` | float | required | Observed Sharpe ratio |
| `num_trials` | int | required | Number of strategies tested (M) |
| `num_obs` | int | required | Number of observations (T) |
| `skewness` | float | 0.0 | Return skewness (0 = normal) |
| `kurtosis` | float | 3.0 | Return kurtosis (3 = normal) |

Returns `(dsr, p_value)`. DSR > 0 with p < 0.05 indicates statistical significance.

Reference: Bailey, D.H. & Lopez de Prado, M. (2014), Eq. 2-4.

### `min_backtest_length(target_sr, num_trials, alpha, skewness, kurtosis)`

Binary search for the minimum number of observations T such that DSR > 0 at the given
significance level. Use this to determine if your backtest window is long enough before
running a parameter search.

```python
from deflated_sharpe import min_backtest_length

# "I want Sharpe 1.5 after testing 200 strategies. How much data do I need?"
min_obs = min_backtest_length(target_sr=1.5, num_trials=200, alpha=0.05)
```

### `benjamini_hochberg(p_values, alpha)`

Benjamini-Hochberg FDR correction for evaluating multiple strategies simultaneously.
When you have N candidate strategies each with a DSR p-value, BH controls the false
discovery rate at level alpha.

```python
from deflated_sharpe import deflated_sharpe_ratio, benjamini_hochberg

p_values = [
    deflated_sharpe_ratio(sr, num_trials=50, num_obs=500)[1]
    for sr in [1.2, 0.8, 1.5, 0.3]
]
results = benjamini_hochberg(p_values, alpha=0.05)
for idx, p, sig in results:
    print(f"Strategy {idx}: p={p:.4f}, significant={sig}")
```

### `RegimeDecayDetector`

Live monitoring for strategy regime decay. Three independent signals with 2/3 majority vote:

- **S1 Win Rate Decay**: Bayesian Beta updating with backtest prior. Fires when P(win_rate < breakeven) exceeds threshold.
- **S2 Drawdown Exceedance**: Fires when current drawdown exceeds `dd_multiplier` (default 1.5x) times backtest maximum drawdown.
- **S3 Market OOD**: Mahalanobis distance on market features (ATR ratio, trend, ATR percentile). Fires when recent trades are beyond the training distribution's 95th percentile.

Anti-false-positive measures: minimum 20 trades before assessment, cooling period of 5 trades after trigger, Bonferroni correction for multiple strategies.

| Config Parameter | Default | Description |
|------------------|---------|-------------|
| `min_trades` | 20 | Minimum trades before assessment |
| `cooling_period` | 5 | Trades to skip after trigger |
| `win_rate_decay_prob_threshold` | 0.80 | P(wr < breakeven) threshold |
| `dd_multiplier` | 1.5 | Drawdown exceedance multiplier |
| `ood_percentile` | 95.0 | Mahalanobis percentile threshold |
| `num_strategies` | 1 | N for Bonferroni correction |

## Paper Verification

The DSR implementation is verified against the original paper's mathematics:
Gumbel approximation for E[Z_max], standard error with non-normality correction,
and the full DSR test statistic. See [`tests/test_paper_verification.py`](tests/test_paper_verification.py)
for numerical checks against known values from Bailey & Lopez de Prado (2014).

## Zero Dependencies

The core library uses only Python standard library (`math`, `dataclasses`).
The `_math.py` module implements `norm_cdf`, matrix inversion, and Mahalanobis distance
from scratch to avoid pulling in NumPy/SciPy for basic usage.

For the regime detector's Bayesian win rate signal (S1), `scipy.stats.beta` is used if
available; otherwise a point-estimate fallback is used. Install the optional dependency:

```bash
pip install "deflated-sharpe[scipy]"
```

## References

Bailey, D. H., & Lopez de Prado, M. (2014). "The Deflated Sharpe Ratio: Correcting for
Selection Bias, Backtest Overfitting, and Non-Normality." *Journal of Portfolio Management*,
40(5), 94-107. DOI: 10.3905/jpm.2014.40.5.094

## License

Apache-2.0
