Metadata-Version: 2.4
Name: financial-data-validation
Version: 0.1.0
Summary: Statistical validation for synthetic financial time series
Project-URL: Homepage, https://github.com/qpaths/financial-data-validation
Project-URL: Documentation, https://github.com/qpaths/financial-data-validation#readme
Project-URL: Repository, https://github.com/qpaths/financial-data-validation
Project-URL: Bug Tracker, https://github.com/qpaths/financial-data-validation/issues
Project-URL: QPaths Platform, https://qpaths.io
Author-email: QPaths <hello@qpaths.io>
License: MIT
License-File: LICENSE
Keywords: backtesting,finance,monte-carlo,quant,statistical-tests,synthetic-data,time-series,trading,validation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.12
Requires-Dist: numpy>=2.0
Requires-Dist: scipy>=1.14.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=7.0.0; extra == 'dev'
Requires-Dist: pytest>=9.0.0; extra == 'dev'
Requires-Dist: ruff>=0.14.0; extra == 'dev'
Description-Content-Type: text/markdown

# financial-data-validation

Lightweight statistical validation for financial time series data.

[![Tests](https://github.com/qpaths/financial-data-validation/actions/workflows/test.yml/badge.svg)](https://github.com/qpaths/financial-data-validation/actions/workflows/test.yml)
[![PyPI](https://img.shields.io/pypi/v/financial-data-validation)](https://pypi.org/project/financial-data-validation/)
[![Python](https://img.shields.io/pypi/pyversions/financial-data-validation)](https://pypi.org/project/financial-data-validation/)
[![License](https://img.shields.io/github/license/qpaths/financial-data-validation)](LICENSE)

## Why This Exists

Need to validate synthetic market data, backtest results, or trading signals? `statsmodels` has everything, but it's 50+ MB with complex dependencies.

This package extracts only the diagnostic tests that matter for financial data:

- **Ljung-Box** - autocorrelation in returns
- **ARCH effects** - volatility clustering
- **Jarque-Bera** - return distribution normality
- **Kolmogorov-Smirnov** - distribution shape
- **Variance Ratio** - mean reversion vs momentum
- **Runs Test** - randomness of return signs

**2 MB install. numpy + scipy only. Purpose-built for finance.**

## Installation

### With uv:

```bash
uv sync
```

### With pip:

```bash
pip install financial-data-validation
```

## Quick Start

```python
import numpy as np

from financial_data_validation import validate_paths

# Your price paths (n_paths, n_timesteps)
paths = np.random.lognormal(0, 0.02, size=(1000, 252))

# Validate
report = validate_paths(paths, frequency="daily")

print(report)
# Financial Data Validation Report ✓ PASSED
# Overall Quality Score: 86.3/100
# ...

# Check individual scores
print(f"ARCH (volatility clustering): {report.arch_score:.2f}")
print(f"Passed: {report.passed}")
```

## What Gets Tested

| Test                   | What It Validates            | Good Data Should...                              |
| ---------------------- | ---------------------------- | ------------------------------------------------ |
| **Ljung-Box**          | Autocorrelation in returns   | Show no autocorrelation (p > 0.05)               |
| **ARCH**               | Volatility clustering        | Show clustering (p < 0.05)                       |
| **Jarque-Bera**        | Skewness and kurtosis        | Have reasonable moments (\|skew\| < 1, kurt < 5) |
| **Kolmogorov-Smirnov** | Distribution shape vs normal | Fit reasonably well (D < 0.08)                   |
| **Variance Ratio**     | Random walk behavior         | Have VR ≈ 1 at multiple horizons                 |
| **Runs Test**          | Sign randomness              | Show random +/- sequencing                       |

## Individual Tests

```python
from financial_data_validation.utils import compute_returns
from financial_data_validation.diagnostics.arch import arch_test

returns = compute_returns(paths)
score, details = arch_test(returns, lags=20)

print(f"ARCH score: {score:.3f}")
print(f"Volatility clustering: {'Yes' if details['passed'] else 'No'}")
```

Available tests:

- `ljung_box_test(returns, lags=20)` - autocorrelation
- `arch_test(returns, lags=20)` - volatility clustering
- `jarque_bera_test(returns)` - normality
- `ks_test(returns)` - distribution shape
- `variance_ratio_test(returns, lags=[2,5,10])` - random walk
- `runs_test(returns)` - sign randomness

## Custom Validation

```python
# Stricter threshold
report = validate_paths(paths, threshold=85.0)

# Custom weights (emphasize volatility clustering)
weights = {
    "ljung_box": 0.15,
    "arch": 0.40,        # Increased importance
    "jarque_bera": 0.15,
    "ks": 0.10,
    "variance_ratio": 0.10,
    "runs": 0.10
}
report = validate_paths(paths, weights=weights)

# Different data frequency
report = validate_paths(paths, frequency="hourly")  # Uses 24 lags
```

## Examples

See [`examples/`](examples/) directory:

- [`basic_usage.py`](examples/basic_usage.py) - Complete validation workflow
- [`individual_tests.py`](examples/individual_tests.py) - Run tests independently
- [`custom_validation.py`](examples/custom_validation.py) - Custom settings
- [`comparing_models.py`](examples/comparing_models.py) - Compare GBM vs GARCH

## Quality Score Interpretation

- **90-100**: Excellent - indistinguishable from real markets
- **80-89**: Good - suitable for most applications
- **70-79**: Acceptable - passes minimum requirements
- **< 70**: Poor - may produce unreliable results

## When to Use This

**Use for:**

- Validating synthetic market data from Monte Carlo simulations
- Quality-checking GARCH, Heston, or other stochastic models
- Verifying backtest input data integrity
- Testing financial data generation pipelines

**Don't use for:**

- Time series forecasting (use `statsmodels` instead)
- Econometric modeling (use `statsmodels` instead)
- Non-financial time series (this is finance-specific)

## Performance

Vectorized operations make validation fast:

- 10,000 paths × 252 timesteps: **~0.5 seconds**
- 50,000 paths × 252 timesteps: **~2 seconds**

## Requirements

- Python ≥ 3.12
- numpy ≥ 2.0.0
- scipy ≥ 1.14.0

## Contributing

Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md).

## License

MIT License - see [LICENSE](LICENSE)

## Built By

[QPaths](https://qpaths.io) - We use this package to validate every synthetic dataset we generate.

## Citation

If you use this package in academic research:

```bibtex
@software{financial_data_validation,
  title = {financial-data-validation: Statistical validation for financial time series},
  author = {QPaths},
  year = {2026},
  url = {https://github.com/qpaths/financial-data-validation}
}
```
