Metadata-Version: 2.4
Name: pyscarcopula
Version: 0.1.0
Summary: Stochastic copula models with Ornstein-Uhlenbeck latent process
Author-email: Alexey Novokhatskiy <aanovokhatskiy@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/AANovokhatskiy/pyscarcopula
Project-URL: Repository, https://github.com/AANovokhatskiy/pyscarcopula
Project-URL: Issues, https://github.com/AANovokhatskiy/pyscarcopula/issues
Keywords: copula,stochastic,SCAR,vine copula,transfer matrix,risk management,VaR,CVaR,time-varying dependence
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Office/Business :: Financial
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: numpy>=1.22
Requires-Dist: numba>=0.56
Requires-Dist: scipy>=1.9
Requires-Dist: joblib>=1.0
Requires-Dist: tqdm>=4.0
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"
Dynamic: license-file

# pyscarcopula

Stochastic copula models with Ornstein-Uhlenbeck latent process in Python.

- [About](#about)
- [Install](#install)
- [Features](#features)
- [Mathematical background](#mathematical-background)
  * [Copula models](#copula-models)
  * [Stochastic copula (SCAR)](#stochastic-copula-scar)
  * [Transfer matrix method](#transfer-matrix-method)
  * [Goodness of fit](#goodness-of-fit)
- [Examples](#examples)
  * [1. Read dataset](#read-dataset)
  * [2. Initialize and fit a bivariate copula](#fit-bivariate)
  * [3. Sample from copula](#sample-copula)
  * [4. Fit a multivariate C-vine copula](#fit-vine)
  * [5. Risk metrics (VaR / CVaR)](#risk-metrics)
- [Performance tuning](#performance-tuning)
<!-- - [Citation](#citation) -->
- [License](#license)

## About

**pyscarcopula** fits multivariate distributions using the copula approach with time-varying dependence. The classical constant-parameter model is extended to a stochastic model where the copula parameter follows an Ornstein-Uhlenbeck process. The idea is based on the discrete SCAR model of Liesenfeld and Richard (2003) and Hafner and Manner (2012); here we develop it in the continuous-time setting.

For parameter estimation we provide five methods:
| Method | Key | Description |
|---|---|---|
| Maximum likelihood | `mle` | Classical fit with constant copula parameter |
| MC p-sampler | `scar-p-ou` | Monte Carlo without importance sampling |
| MC m-sampler | `scar-m-ou` | Monte Carlo with efficient importance sampling (EIS) |
| **Transfer matrix** | **`scar-tm-ou`** | **Deterministic quadrature on a grid — numerically exact, no MC bias** |
| GAS | `gas` | Generalized autoregressive score (observation-driven, deterministic) |

The transfer matrix method exploits the Markov structure and known Gaussian transition density of the OU process to evaluate the likelihood function as a sequence of matrix-vector products with complexity $O(TK^2)$. The implementation automatically selects between dense and sparse transfer matrices depending on the kernel bandwidth, and adaptively refines the grid to resolve the transition kernel.

<!-- For details see our paper:

> A. A. Novokhatskiy, M. E. Semenov, *Robust numerical scheme for stochastic copula models* (2026). -->

## Install

```bash
pip install pyscarcopula
```

For development (includes data files and tests):

```bash
git clone https://github.com/AANovokhatskiy/pyscarcopula
cd pyscarcopula
pip install -e ".[test]"
pytest tests/
```

**Dependencies:** numpy, numba, scipy, joblib, tqdm.

## Features

**Copula families**
- Archimedean: Gumbel, Frank, Clayton, Joe (with rotations 0°/90°/180°/270°)
- Elliptical: Gaussian, Student-t (MLE only)
- Independence copula (null model for automatic vine pruning)
- C-vine pair copula construction

**Estimation methods**
- MLE — constant copula parameter
- SCAR-TM-OU — transfer matrix with analytical gradient (recommended)
- GAS — observation-driven score model
- SCAR-P-OU / SCAR-M-OU — Monte Carlo alternatives

**Vine copulas**
- Automatic copula family and rotation selection per edge (AIC/BIC)
- Automatic pruning of weak edges via independence copula baseline
- Tree-level and edge-level truncation for scalability to large d

**Diagnostics and risk**
- Goodness-of-fit via Rosenblatt transform + Cramér–von Mises test
- Mixture Rosenblatt transform for stochastic models
- Smoothed time-varying copula parameter $\bar{\theta}_k = \mathbb{E}[\Psi(x_k) \mid u_{1:k-1}]$
- VaR / CVaR via `pyscarcopula.contrib` (rolling window, marginals, portfolio optimization, parallel)

## Mathematical background

### Copula models

By Sklar's theorem, any joint distribution can be decomposed as

```math
F(x_1, \ldots, x_d) = C(F_1(x_1), \ldots, F_d(x_d))
```

We focus on single-parameter Archimedean copulas defined via a generator $\phi(t; \theta)$:

```math
C(u_1, \ldots, u_d) = \phi^{-1}(\phi(u_1; \theta) + \cdots + \phi(u_d; \theta))
```

| Copula  | $\phi(t; \theta)$ | $\phi^{-1}(t; \theta)$ | $\theta \in$ |
|---------|-------------------|------------------------|--------------|
| Gumbel  | $(-\log t)^\theta$ | $\exp(-t^{1/\theta})$ | $[1, \infty)$ |
| Frank   | $-\log\left(\frac{e^{-\theta t} - 1}{e^{-\theta} - 1}\right)$ | $-\frac{1}{\theta}\log(1 + e^{-t}(e^{-\theta} - 1))$ | $(0, \infty)$ |
| Joe     | $-\log(1 - (1-t)^\theta)$ | $1 - (1 - e^{-t})^{1/\theta}$ | $[1, \infty)$ |
| Clayton | $\frac{1}{\theta}(t^{-\theta} - 1)$ | $(1 + t\theta)^{-1/\theta}$ | $(0, \infty)$ |

### Stochastic copula (SCAR)

In the stochastic model the copula parameter is driven by a latent Ornstein-Uhlenbeck process:

```math
\theta_t = \Psi(x_t), \qquad dx_t = \theta_{\text{OU}}(\mu - x_t)\,dt + \nu\,dW_t
```

where $\Psi$ maps the OU state to the copula parameter domain. The likelihood function is an integral over all latent paths:

```math
L = \int \prod_{t} c(u_{1t}, u_{2t}; \Psi(x_t))\;p(x_t | x_{t-1})\;dx_0 \cdots dx_T
```

### Transfer matrix method

The Markov property allows this high-dimensional integral to be factored into a chain of one-dimensional integrals, each computed as a matrix-vector product on a discretized grid. The backward recursion is:

```math
\mathbf{m}_t = \widetilde{\mathbf{T}}(\mathbf{f}_t \odot \mathbf{m}_{t+1}), \qquad t = T-1, \ldots, 1
```

where $\widetilde{\mathbf{T}}$ is the transition kernel with trapezoidal weights baked in, $\mathbf{f}_t$ is the copula density evaluated at observation $t$, and $\odot$ is the element-wise product. Total complexity: $O(TK^2)$ (dense) or $O(TKb)$ (sparse, where $b$ is the kernel bandwidth).

### Goodness of fit

Model quality is assessed via the Rosenblatt transform. For the stochastic model we use the **mixture Rosenblatt transform**:

```math
u'_{2,t} = \mathbb{E}\left[h(u_{2t}, u_{1t}; \Psi(x_t)) \mid u_{1:t-1}\right]
```

which integrates the h-function over the predictive distribution of the latent state, avoiding the Jensen bias from plugging in a point estimate. Uniformity of the transformed sample is tested with the Cramér–von Mises statistic.


## Examples

<a name="read-dataset"></a>
### 1. Read dataset

```python
import pandas as pd
import numpy as np
from pyscarcopula.utils import pobs
from pyscarcopula import (GumbelCopula, FrankCopula, JoeCopula, ClaytonCopula,
                          IndependentCopula, GaussianCopula, StudentCopula,
                          CVineCopula)
from pyscarcopula.stattests import gof_test

crypto_prices = pd.read_csv("data/crypto_prices.csv", index_col=0, sep=';')
tickers = ['BTC-USD', 'ETH-USD']

returns_pd = np.log(crypto_prices[tickers] / crypto_prices[tickers].shift(1))[1:]
returns = returns_pd.values
u = pobs(returns)
```

<a name="fit-bivariate"></a>
### 2. Initialize and fit a bivariate copula

```python
copula_mle = GumbelCopula(rotate=180)
copula_tm = GumbelCopula(rotate=180)
copula_gas = GumbelCopula(rotate=180)

# Static model (constant parameter)
fit_result_mle = copula_mle.fit(data=returns, method='mle', to_pobs=True)
gof_result_mle = gof_test(copula_mle, returns, to_pobs=True)

# Stochastic model (transfer matrix)
fit_result_tm = copula_tm.fit(data=returns, method='scar-tm-ou', to_pobs=True)
gof_result_tm = gof_test(copula_tm, returns, to_pobs=True)

# GAS model
fit_result_gas = copula_gas.fit(data=returns, method='gas', to_pobs=True)
gof_result_gas = gof_test(copula_gas, returns, to_pobs=True)
```

Results on daily BTC-ETH data (T = 1460):

| Model | logL | GoF p-value |
|-------|------|-------------|
| MLE | 955.63 | 0.0087 |
| GAS | 1031.42 | 0.5282 |
| **SCAR-TM** | **1042.47** | **0.6201** |

SCAR-TM achieves the highest log-likelihood and the best GoF calibration. MLE is rejected at the 1% level; both dynamic models pass comfortably.

Available rotations: `[0, 90, 180, 270]`. Available methods: `['mle', 'scar-p-ou', 'scar-m-ou', 'scar-tm-ou', 'gas']`.

<a name="sample-copula"></a>
### 3. Sample from copula

```python
# Sample from constant-parameter copula
samples_mle = copula_mle.sample(n=1000, r=fit_result_mle.copula_param)

# Sample next state from stochastic copula
samples_tm = copula_tm.predict(n=1000)
```

<a name="fit-vine"></a>
### 4. Fit a multivariate C-vine copula

```python
tickers = ['BTC-USD', 'ETH-USD', 'BNB-USD', 'ADA-USD', 'XRP-USD', 'DOGE-USD']
returns_pd = np.log(crypto_prices[tickers] / crypto_prices[tickers].shift(1))[1:251]
returns = returns_pd.values
u = pobs(returns)

# C-vine with SCAR-TM (truncated for speed)
vine = CVineCopula()
vine.fit(u, method='scar-tm-ou',
         truncation_level=2, min_edge_logL=10)
vine.summary()

# Comparison models
copula_s = StudentCopula()
copula_g = GaussianCopula()
copula_s.fit(u)
copula_g.fit(u)

gof_result_vine = gof_test(vine, u, to_pobs=False)
gof_result_s = gof_test(copula_s, u, to_pobs=False)
gof_result_g = gof_test(copula_g, u, to_pobs=False)
```

Results on 6-dimensional crypto data (T = 250):

| Model | logL | GoF p-value | Fit time |
|-------|------|-------------|----------|
| **C-vine SCAR-TM** | **921.9** | **0.8971** | 13.4s |
| C-vine MLE | 869.2 | 0.2072 | 0.6s |
| Student-t | 764.4 | 0.0001 | — |
| Gaussian | 761.0 | 0.0000 | — |

The C-vine with SCAR-TM substantially outperforms all alternatives. With truncation (`truncation_level=2, min_edge_logL=10`), only 9 of 15 edges use SCAR — the remaining 6 fall back to MLE — achieving a good speed/accuracy tradeoff.

<a name="risk-metrics"></a>
### 5. Risk metrics (VaR / CVaR)

Risk metrics are provided in `pyscarcopula.contrib` — optional modules for rolling VaR/CVaR estimation, marginal distribution fitting, and portfolio optimization.

```python
from pyscarcopula.contrib.risk_metrics import risk_metrics

d = len(tickers)
result = risk_metrics(
    vine, returns, 
    window_len=100,
    gamma=[0.95],
    N_mc=[100_000],
    marginals_method='johnsonsu',
    method='mle',
    optimize_portfolio=False,
    portfolio_weight=np.ones(d) / d,
    n_jobs=-1,  # parallel over rolling windows
)

var = result[0.95][100_000]['var']
cvar = result[0.95][100_000]['cvar']
```

See `example.ipynb` for a complete walkthrough with plots.

## Performance tuning

The SCAR-TM-OU method has several parameters that control the speed/accuracy tradeoff. Here are recommended configurations:

### Bivariate copula

```python
copula = GumbelCopula(rotate=180)

# Default (analytical gradient + smart init, both enabled by default)
copula.fit(u, method='scar-tm-ou')

# Disable optimizations for debugging / backward compatibility
copula.fit(u, method='scar-tm-ou', analytical_grad=False, smart_init=False)

# Relaxed tolerance (faster, slight logL loss)
copula.fit(u, method='scar-tm-ou', tol=5e-2)
```

| Parameter | Default | Effect |
|---|---|---|
| `analytical_grad` | `True` | Analytical gradient. ~3-4x fewer function evaluations. |
| `smart_init` | `True` | Heuristic initial point. Up to 5x speedup on long series. |
| `tol` | `1e-2` | Gradient tolerance. `5e-2` is ~2x faster with negligible logL loss. |
| `K` | `300` | Minimum grid size. The adaptive rule may increase it to ensure adequate resolution of the transition kernel. |
| `pts_per_sigma` | `2` | Grid density: number of points per conditional standard deviation $\sigma_c$. The adaptive rule computes $K_\text{eff} = \max(K,\, \lceil 2R\sigma / (\sigma_c / \texttt{pts\_per\_sigma}) \rceil)$, where $R$ is the grid half-range in $\sigma$ units. Higher values improve quadrature accuracy at the cost of larger $K$ and longer runtime. The default of 2 is sufficient for most cases; increasing to 4 may help for very peaked transition kernels (large $\theta$, small $\nu$). |

### Vine copula

```python
vine = CVineCopula()

# Recommended: skip SCAR on weak edges and upper trees
vine.fit(u, method='scar-tm-ou', truncation_level=2, min_edge_logL=10)
```

| Parameter | Default | Effect |
|---|---|---|
| `truncation_level` | `None` | Trees ≥ this level stay MLE. Recommended: 2-3 for d > 10. |
| `min_edge_logL` | `None` | Edges with MLE logL below threshold stay MLE. Recommended: 5-10. |

Truncated edges use MLE (constant parameter). The GoF test handles mixed SCAR/MLE models correctly.

**d=6 crypto, T=250:**

| Configuration | Time | logL | GoF p-value |
|---|---|---|---|
| `truncation_level=2, min_edge_logL=10` | ~13s | ~922 | ~0.90 |
| Full SCAR (15 edges) | ~30s | ~891 | ~0.90 |
| MLE only | ~0.6s | ~869 | ~0.21 |

<!-- ## Citation

If you use this package in your research, please cite:

```bibtex
@article{novokhatskiy2026scar,
  title={Robust numerical scheme for stochastic copula models},
  author={Novokhatskiy, A. A. and Semenov, M. E.},
  year={2026}
}
``` -->

## License

MIT License. See [LICENSE.txt](LICENSE.txt).
