insurance_frequency_severity.dependent
insurance_frequency_severity.dependent
Dependent frequency-severity neural two-part model for insurance pricing.
The core idea is multi-task learning: a single shared encoder trunk processes covariates and produces a latent representation that feeds both a Poisson frequency head and a Gamma severity head. Gradients from both losses flow through the trunk simultaneously, so it learns features that are jointly informative for frequency and severity. That shared information is where the implicit frequency-severity dependence lives.
On top of the latent dependence you can add the explicit Garrido-Genest-Schulz conditional covariate (log μ += γ·N), which gives a semi-analytical pure premium correction and a directly interpretable dependence parameter.
This subpackage is distinct from the Sarmanov copula approach in the parent package. Use Sarmanov when you need an analytical bivariate density and a simple omega parameter for a regulator; use this subpackage when you have a large dataset, suspect nonlinear interactions, and want a single model that learns both tasks jointly.
Requires torch for neural network classes. Install with: pip install insurance-frequency-severity[neural]
Public API
Model
~
DependentFreqSevNet – PyTorch nn.Module (shared trunk + heads)
SharedTrunkConfig – dataclass for trunk hyperparameters (no torch needed)
FrequencyHead – Poisson head module
SeverityHead – Gamma head module
Training
~~~~
JointLoss – Poisson + Gamma NLL with configurable balancing
TrainingConfig – dataclass for training hyperparameters (no torch needed)
DependentFSTrainer – training loop with early stopping and LR scheduling
Wrapper
~~~
DependentFSModel – sklearn-compatible estimator (fit/predict/score)
Premium
~~~
PurePremiumEstimator – Monte Carlo + optional MGF analytical correction
Diagnostics
~~~
DependentFSDiagnostics – Lorenz, calibration, dependence tests, latent corr
Data
~~~~
FreqSevDataset – PyTorch Dataset with exposure handling
prepare_features – numeric encoding helper (no torch needed)
Benchmarks
~~
make_dependent_claims – synthetic claims with known γ dependence
make_independent_claims – synthetic independent baseline
1""" 2insurance_frequency_severity.dependent 3======================================= 4Dependent frequency-severity neural two-part model for insurance pricing. 5 6The core idea is multi-task learning: a single shared encoder trunk processes 7covariates and produces a latent representation that feeds both a Poisson 8frequency head and a Gamma severity head. Gradients from both losses flow 9through the trunk simultaneously, so it learns features that are jointly 10informative for frequency *and* severity. That shared information is where the 11implicit frequency-severity dependence lives. 12 13On top of the latent dependence you can add the explicit Garrido-Genest-Schulz 14conditional covariate (log μ += γ·N), which gives a semi-analytical pure 15premium correction and a directly interpretable dependence parameter. 16 17This subpackage is distinct from the Sarmanov copula approach in the parent 18package. Use Sarmanov when you need an analytical bivariate density and a 19simple omega parameter for a regulator; use this subpackage when you have a 20large dataset, suspect nonlinear interactions, and want a single model that 21learns both tasks jointly. 22 23Requires torch for neural network classes. 24Install with: pip install insurance-frequency-severity[neural] 25 26Public API 27---------- 28Model 29~~~~~ 30``DependentFreqSevNet`` – PyTorch nn.Module (shared trunk + heads) 31``SharedTrunkConfig`` – dataclass for trunk hyperparameters (no torch needed) 32``FrequencyHead`` – Poisson head module 33``SeverityHead`` – Gamma head module 34 35Training 36~~~~~~~~ 37``JointLoss`` – Poisson + Gamma NLL with configurable balancing 38``TrainingConfig`` – dataclass for training hyperparameters (no torch needed) 39``DependentFSTrainer`` – training loop with early stopping and LR scheduling 40 41Wrapper 42~~~~~~~ 43``DependentFSModel`` – sklearn-compatible estimator (fit/predict/score) 44 45Premium 46~~~~~~~ 47``PurePremiumEstimator`` – Monte Carlo + optional MGF analytical correction 48 49Diagnostics 50~~~~~~~~~~~ 51``DependentFSDiagnostics`` – Lorenz, calibration, dependence tests, latent corr 52 53Data 54~~~~ 55``FreqSevDataset`` – PyTorch Dataset with exposure handling 56``prepare_features`` – numeric encoding helper (no torch needed) 57 58Benchmarks 59~~~~~~~~~~ 60``make_dependent_claims`` – synthetic claims with known γ dependence 61``make_independent_claims`` – synthetic independent baseline 62""" 63 64# SharedTrunkConfig and TrainingConfig are safe to import eagerly — pure dataclasses, no torch. 65from insurance_frequency_severity.dependent.model import SharedTrunkConfig 66from insurance_frequency_severity.dependent.training import TrainingConfig 67# prepare_features is also torch-free. 68from insurance_frequency_severity.dependent.data import prepare_features 69# Benchmark generators use only numpy/scipy. 70from insurance_frequency_severity.dependent.benchmarks import make_dependent_claims, make_independent_claims 71 72# All remaining names require torch and are loaded lazily. 73_NEURAL_NAMES = { 74 "DependentFreqSevNet": ("insurance_frequency_severity.dependent.model", "DependentFreqSevNet"), 75 "FrequencyHead": ("insurance_frequency_severity.dependent.model", "FrequencyHead"), 76 "SeverityHead": ("insurance_frequency_severity.dependent.model", "SeverityHead"), 77 "JointLoss": ("insurance_frequency_severity.dependent.training", "JointLoss"), 78 "DependentFSTrainer": ("insurance_frequency_severity.dependent.training", "DependentFSTrainer"), 79 "DependentFSModel": ("insurance_frequency_severity.dependent.wrapper", "DependentFSModel"), 80 "PurePremiumEstimator": ("insurance_frequency_severity.dependent.premium", "PurePremiumEstimator"), 81 "DependentFSDiagnostics": ("insurance_frequency_severity.dependent.diagnostics", "DependentFSDiagnostics"), 82 "FreqSevDataset": ("insurance_frequency_severity.dependent.data", "FreqSevDataset"), 83 "make_train_val_loaders": ("insurance_frequency_severity.dependent.data", "make_train_val_loaders"), 84} 85 86 87def __getattr__(name: str): 88 if name in _NEURAL_NAMES: 89 module_path, attr = _NEURAL_NAMES[name] 90 import importlib 91 mod = importlib.import_module(module_path) 92 return getattr(mod, attr) 93 raise AttributeError(f"module 'insurance_frequency_severity.dependent' has no attribute {name!r}") 94 95 96__all__ = [ 97 "SharedTrunkConfig", 98 "TrainingConfig", 99 "prepare_features", 100 "make_dependent_claims", 101 "make_independent_claims", 102 # Neural (torch required): 103 "DependentFreqSevNet", 104 "FrequencyHead", 105 "SeverityHead", 106 "JointLoss", 107 "DependentFSTrainer", 108 "DependentFSModel", 109 "PurePremiumEstimator", 110 "DependentFSDiagnostics", 111 "FreqSevDataset", 112 "make_train_val_loaders", 113]
62@dataclass 63class TrainingConfig: 64 """Hyperparameters for the training loop. 65 66 Parameters 67 ---------- 68 max_epochs: 69 Maximum number of training epochs. 70 batch_size: 71 Mini-batch size. Larger batches give more stable gradient estimates 72 but use more memory. For most UK motor datasets 512-2048 works well. 73 lr: 74 Learning rate for the Adam optimiser. 75 trunk_lr_multiplier: 76 If not 1.0, the trunk uses ``lr * trunk_lr_multiplier`` while the heads 77 use ``lr``. A value < 1.0 (e.g. 0.3) slows trunk updates, which can 78 help when the heads need to adapt faster. 79 weight_decay: 80 L2 regularisation applied to all parameters. 81 loss_weight_sev: 82 Fixed severity loss weight (used when ``auto_balance=False``). 83 auto_balance: 84 Equalise Poisson and Gamma loss magnitudes automatically each step. 85 patience: 86 Early stopping patience (epochs without improvement on validation loss). 87 Set to None to disable early stopping. 88 min_delta: 89 Minimum improvement in validation loss to count as an improvement. 90 lr_reduce_factor: 91 Factor by which the LR scheduler reduces the learning rate on plateau. 92 lr_patience: 93 Epochs without improvement before the LR scheduler fires. 94 verbose: 95 Whether to print epoch-level training summaries. 96 device: 97 PyTorch device string. ``"auto"`` selects CUDA if available, else CPU. 98 """ 99 100 max_epochs: int = 100 101 batch_size: int = 512 102 lr: float = 1e-3 103 trunk_lr_multiplier: float = 1.0 104 weight_decay: float = 1e-4 105 loss_weight_sev: float = 1.0 106 auto_balance: bool = True 107 patience: Optional[int] = 15 108 min_delta: float = 1e-4 109 lr_reduce_factor: float = 0.5 110 lr_patience: int = 5 111 verbose: bool = True 112 device: str = "auto"
Hyperparameters for the training loop.
Parameters
max_epochs:
Maximum number of training epochs.
batch_size:
Mini-batch size. Larger batches give more stable gradient estimates
but use more memory. For most UK motor datasets 512-2048 works well.
lr:
Learning rate for the Adam optimiser.
trunk_lr_multiplier:
If not 1.0, the trunk uses lr * trunk_lr_multiplier while the heads
use lr. A value < 1.0 (e.g. 0.3) slows trunk updates, which can
help when the heads need to adapt faster.
weight_decay:
L2 regularisation applied to all parameters.
loss_weight_sev:
Fixed severity loss weight (used when auto_balance=False).
auto_balance:
Equalise Poisson and Gamma loss magnitudes automatically each step.
patience:
Early stopping patience (epochs without improvement on validation loss).
Set to None to disable early stopping.
min_delta:
Minimum improvement in validation loss to count as an improvement.
lr_reduce_factor:
Factor by which the LR scheduler reduces the learning rate on plateau.
lr_patience:
Epochs without improvement before the LR scheduler fires.
verbose:
Whether to print epoch-level training summaries.
device:
PyTorch device string. "auto" selects CUDA if available, else CPU.
55def prepare_features( 56 df: pd.DataFrame, 57 numeric_cols: List[str], 58 categorical_cols: Optional[List[str]] = None, 59 transformer: Optional[ColumnTransformer] = None, 60) -> Tuple[np.ndarray, ColumnTransformer]: 61 """Encode a DataFrame into a numeric matrix. 62 63 Numeric columns are standardised (zero mean, unit variance). Categorical 64 columns are one-hot encoded (unknown categories at inference are set to 65 all-zero rows, not an error). 66 67 Does not require torch. 68 69 Parameters 70 ---------- 71 df: 72 Input data. Must contain all columns in ``numeric_cols`` and 73 ``categorical_cols``. 74 numeric_cols: 75 Names of numeric/continuous columns. 76 categorical_cols: 77 Names of categorical columns. Pass ``None`` or ``[]`` if there are no 78 categoricals. 79 transformer: 80 A fitted ``ColumnTransformer`` from a previous call to this function. 81 Pass this when encoding held-out or test data to ensure the same 82 encoding is applied. When ``None``, a new transformer is fitted on 83 ``df``. 84 85 Returns 86 ------- 87 X: np.ndarray of shape (n, n_features_out) 88 Encoded feature matrix. 89 transformer: ColumnTransformer 90 Fitted transformer (reuse for test data). 91 92 Examples 93 -------- 94 >>> X_train, ct = prepare_features(df_train, numeric_cols=["age", "value"], 95 ... categorical_cols=["vehicle_class"]) 96 >>> X_test, _ = prepare_features(df_test, numeric_cols=["age", "value"], 97 ... categorical_cols=["vehicle_class"], 98 ... transformer=ct) 99 """ 100 categorical_cols = categorical_cols or [] 101 102 if transformer is None: 103 transformers = [] 104 if numeric_cols: 105 transformers.append( 106 ("num", StandardScaler(), numeric_cols) 107 ) 108 if categorical_cols: 109 transformers.append( 110 ( 111 "cat", 112 OneHotEncoder(handle_unknown="ignore", sparse_output=False), 113 categorical_cols, 114 ) 115 ) 116 transformer = ColumnTransformer(transformers, remainder="drop") 117 transformer.fit(df) 118 119 X = transformer.transform(df).astype(np.float32) 120 return X, transformer
Encode a DataFrame into a numeric matrix.
Numeric columns are standardised (zero mean, unit variance). Categorical columns are one-hot encoded (unknown categories at inference are set to all-zero rows, not an error).
Does not require torch.
Parameters
df:
Input data. Must contain all columns in numeric_cols and
categorical_cols.
numeric_cols:
Names of numeric/continuous columns.
categorical_cols:
Names of categorical columns. Pass None or [] if there are no
categoricals.
transformer:
A fitted ColumnTransformer from a previous call to this function.
Pass this when encoding held-out or test data to ensure the same
encoding is applied. When None, a new transformer is fitted on
df.
Returns
X: np.ndarray of shape (n, n_features_out) Encoded feature matrix. transformer: ColumnTransformer Fitted transformer (reuse for test data).
Examples
>>> X_train, ct = prepare_features(df_train, numeric_cols=["age", "value"],
... categorical_cols=["vehicle_class"])
>>> X_test, _ = prepare_features(df_test, numeric_cols=["age", "value"],
... categorical_cols=["vehicle_class"],
... transformer=ct)
37def make_dependent_claims( 38 n_policies: int = 10_000, 39 gamma: float = -0.15, 40 base_freq: float = 0.08, 41 base_sev: float = 3_000.0, 42 phi: float = 1.5, 43 n_features: int = 5, 44 seed: int = 42, 45 test_fraction: float = 0.2, 46) -> Tuple[pd.DataFrame, pd.DataFrame]: 47 """Generate synthetic motor insurance claims with known frequency-severity dependence. 48 49 Data generating process 50 ----------------------- 51 For each policy i with covariates xᵢ and exposure tᵢ: 52 53 1. log λᵢ = log(tᵢ) + β₀ + Xᵢ · β_freq (Poisson frequency) 54 2. Nᵢ ~ Poisson(λᵢ · tᵢ) 55 3. log μᵢ = α₀ + Xᵢ · β_sev + γ · Nᵢ (GGS conditional severity) 56 4. If Nᵢ > 0: Ȳᵢ ~ Gamma(shape=Nᵢ/φ, mean=μᵢ); else Ȳᵢ = 0 57 58 The parameter γ is the true dependence parameter the model should recover. 59 Negative γ (default −0.15) means higher-claim-count policies have lower 60 average severity — the pattern typically found in UK motor. 61 62 Parameters 63 ---------- 64 n_policies: 65 Number of policies in the full dataset (train + test combined). 66 gamma: 67 True dependence parameter. γ=0 gives independence; γ<0 gives negative 68 frequency-severity correlation (typical for motor). 69 base_freq: 70 Baseline claim frequency (λ at x=0, t=1). 71 base_sev: 72 Baseline average severity (μ at x=0, N=0), in pounds. 73 phi: 74 Gamma dispersion parameter φ. Higher values give more severity 75 variability. 76 n_features: 77 Number of synthetic covariates. First half affect frequency, second 78 half affect severity; they all share the same feature matrix so the 79 trunk has genuine heterogeneity to exploit. 80 seed: 81 Random seed. 82 test_fraction: 83 Fraction of data to hold out as test set. 84 85 Returns 86 ------- 87 df_train, df_test : pd.DataFrame 88 DataFrames with columns: 89 ``feature_0``, …, ``feature_{n_features-1}``, 90 ``exposure``, ``n_claims``, ``avg_severity``, ``total_loss``, 91 ``true_lambda``, ``true_mu``. 92 93 Examples 94 -------- 95 >>> df_train, df_test = make_dependent_claims(n_policies=20_000, gamma=-0.15) 96 >>> df_train.head() 97 """ 98 rng = np.random.default_rng(seed) 99 100 # --- Covariates --- 101 X = rng.standard_normal((n_policies, n_features)).astype(np.float32) 102 103 # --- Exposure: mix of short- and full-year policies --- 104 exposure = rng.choice([0.25, 0.5, 0.75, 1.0, 1.0, 1.0], size=n_policies).astype(np.float32) 105 106 # --- Regression coefficients --- 107 n_freq_feats = max(1, n_features // 2) 108 n_sev_feats = n_features - n_freq_feats 109 beta_freq = rng.uniform(-0.3, 0.3, size=n_freq_feats).astype(np.float32) 110 beta_sev = rng.uniform(-0.2, 0.2, size=n_sev_feats).astype(np.float32) 111 112 # --- Frequency --- 113 log_lambda_base = np.log(base_freq) + X[:, :n_freq_feats] @ beta_freq 114 lambda_ = np.exp(log_lambda_base) * exposure 115 n_claims = rng.poisson(lambda_).astype(np.float32) 116 117 # --- Conditional severity --- 118 log_mu = np.log(base_sev) + X[:, n_freq_feats:] @ beta_sev + gamma * n_claims 119 mu = np.exp(log_mu) 120 121 avg_severity = np.zeros(n_policies, dtype=np.float32) 122 pos = n_claims > 0 123 if pos.sum() > 0: 124 alpha = n_claims[pos] / phi 125 rate = n_claims[pos] / (phi * mu[pos] + 1e-10) 126 avg_severity[pos] = rng.gamma(shape=alpha, scale=1.0 / rate).astype(np.float32) 127 128 total_loss = n_claims * avg_severity 129 130 feature_cols = [f"feature_{i}" for i in range(n_features)] 131 df = pd.DataFrame(X, columns=feature_cols) 132 df["exposure"] = exposure 133 df["n_claims"] = n_claims 134 df["avg_severity"] = avg_severity 135 df["total_loss"] = total_loss 136 df["true_lambda"] = lambda_ / exposure # per unit exposure 137 df["true_mu"] = mu 138 139 # --- Split --- 140 n_test = max(1, int(n_policies * test_fraction)) 141 df_test = df.iloc[-n_test:].copy().reset_index(drop=True) 142 df_train = df.iloc[:-n_test].copy().reset_index(drop=True) 143 144 return df_train, df_test
Generate synthetic motor insurance claims with known frequency-severity dependence.
Data generating process
For each policy i with covariates xᵢ and exposure tᵢ:
- log λᵢ = log(tᵢ) + β₀ + Xᵢ · β_freq (Poisson frequency)
- Nᵢ ~ Poisson(λᵢ · tᵢ)
- log μᵢ = α₀ + Xᵢ · β_sev + γ · Nᵢ (GGS conditional severity)
- If Nᵢ > 0: Ȳᵢ ~ Gamma(shape=Nᵢ/φ, mean=μᵢ); else Ȳᵢ = 0
The parameter γ is the true dependence parameter the model should recover. Negative γ (default −0.15) means higher-claim-count policies have lower average severity — the pattern typically found in UK motor.
Parameters
n_policies: Number of policies in the full dataset (train + test combined). gamma: True dependence parameter. γ=0 gives independence; γ<0 gives negative frequency-severity correlation (typical for motor). base_freq: Baseline claim frequency (λ at x=0, t=1). base_sev: Baseline average severity (μ at x=0, N=0), in pounds. phi: Gamma dispersion parameter φ. Higher values give more severity variability. n_features: Number of synthetic covariates. First half affect frequency, second half affect severity; they all share the same feature matrix so the trunk has genuine heterogeneity to exploit. seed: Random seed. test_fraction: Fraction of data to hold out as test set.
Returns
df_train, df_test : pd.DataFrame
DataFrames with columns:
feature_0, …, feature_{n_features-1},
exposure, n_claims, avg_severity, total_loss,
true_lambda, true_mu.
Examples
>>> df_train, df_test = make_dependent_claims(n_policies=20_000, gamma=-0.15)
>>> df_train.head()
147def make_independent_claims( 148 n_policies: int = 10_000, 149 base_freq: float = 0.08, 150 base_sev: float = 3_000.0, 151 phi: float = 1.5, 152 n_features: int = 5, 153 seed: int = 42, 154 test_fraction: float = 0.2, 155) -> Tuple[pd.DataFrame, pd.DataFrame]: 156 """Generate synthetic claims with γ=0 (frequency-severity independence). 157 158 Identical to ``make_dependent_claims`` with ``gamma=0``. Use this as the 159 null comparison to verify the model does not overfit spurious dependence. 160 161 Parameters 162 ---------- 163 n_policies, base_freq, base_sev, phi, n_features, seed, test_fraction: 164 Same as ``make_dependent_claims``. 165 166 Returns 167 ------- 168 df_train, df_test : pd.DataFrame 169 """ 170 return make_dependent_claims( 171 n_policies=n_policies, 172 gamma=0.0, 173 base_freq=base_freq, 174 base_sev=base_sev, 175 phi=phi, 176 n_features=n_features, 177 seed=seed, 178 test_fraction=test_fraction, 179 )
Generate synthetic claims with γ=0 (frequency-severity independence).
Identical to make_dependent_claims with gamma=0. Use this as the
null comparison to verify the model does not overfit spurious dependence.
Parameters
n_policies, base_freq, base_sev, phi, n_features, seed, test_fraction:
Same as make_dependent_claims.
Returns
df_train, df_test : pd.DataFrame