insurance_credibility.classical
Classical Bühlmann-Straub credibility models for non-life insurance pricing.
Implements the Bühlmann-Straub (1970) credibility model and its hierarchical extension (Jewell, 1975). The standard tool for blending group-level loss experience with the portfolio mean, weighted by exposure.
Quick start::
import polars as pl
from insurance_credibility.classical import BuhlmannStraub
bs = BuhlmannStraub()
bs.fit(df, group_col="scheme", period_col="year",
loss_col="loss_rate", weight_col="exposure")
bs.summary()
bs.z_ # credibility factors by scheme
bs.premiums_ # full results DataFrame
For hierarchical multi-level structures::
from insurance_credibility.classical import HierarchicalBuhlmannStraub
model = HierarchicalBuhlmannStraub(level_cols=["region", "district", "sector"])
model.fit(df, period_col="year", loss_col="loss_rate", weight_col="exposure")
model.premiums_at("sector")
1""" 2Classical Bühlmann-Straub credibility models for non-life insurance pricing. 3 4Implements the Bühlmann-Straub (1970) credibility model and its hierarchical 5extension (Jewell, 1975). The standard tool for blending group-level loss 6experience with the portfolio mean, weighted by exposure. 7 8Quick start:: 9 10 import polars as pl 11 from insurance_credibility.classical import BuhlmannStraub 12 13 bs = BuhlmannStraub() 14 bs.fit(df, group_col="scheme", period_col="year", 15 loss_col="loss_rate", weight_col="exposure") 16 bs.summary() 17 bs.z_ # credibility factors by scheme 18 bs.premiums_ # full results DataFrame 19 20For hierarchical multi-level structures:: 21 22 from insurance_credibility.classical import HierarchicalBuhlmannStraub 23 24 model = HierarchicalBuhlmannStraub(level_cols=["region", "district", "sector"]) 25 model.fit(df, period_col="year", loss_col="loss_rate", weight_col="exposure") 26 model.premiums_at("sector") 27""" 28 29from .buhlmann_straub import BuhlmannStraub 30from .hierarchical import HierarchicalBuhlmannStraub, LevelResult 31 32__all__ = [ 33 "BuhlmannStraub", 34 "HierarchicalBuhlmannStraub", 35 "LevelResult", 36]
59class BuhlmannStraub: 60 """ 61 Bühlmann-Straub credibility model with non-parametric structural parameter 62 estimation. 63 64 Fit the model to a panel dataset - one row per (group, period) - where 65 each row has a loss rate and an exposure weight. The model estimates the 66 structural parameters v (within-group variance) and a (between-group 67 variance) from the data itself, then computes a credibility factor Z_i 68 for each group. 69 70 The result is a credibility premium for each group that blends its own 71 weighted-average loss rate with the portfolio mean, with the blend governed 72 entirely by the ratio K = v/a and the group's total exposure. 73 74 Parameters 75 ---------- 76 truncate_a : bool, default True 77 If True (the default), truncate a_hat at zero when the between-group 78 variance estimate is negative. This sets K → ∞ and Z_i → 0 for all 79 groups, so every group gets the collective mean. If False, a negative 80 a_hat raises an error instead. 81 82 Examples 83 -------- 84 >>> import polars as pl 85 >>> from credibility import BuhlmannStraub 86 >>> 87 >>> df = pl.DataFrame({ 88 ... "scheme": ["A", "A", "A", "B", "B", "B", "C", "C", "C"], 89 ... "year": [2021, 2022, 2023] * 3, 90 ... "loss_rate": [0.55, 0.60, 0.58, 0.80, 0.75, 0.82, 0.40, 0.42, 0.38], 91 ... "exposure": [1000, 1200, 1100, 300, 350, 320, 5000, 4800, 5200], 92 ... }) 93 >>> bs = BuhlmannStraub() 94 >>> bs.fit(df, group_col="scheme", period_col="year", 95 ... loss_col="loss_rate", weight_col="exposure") 96 >>> bs.summary() 97 """ 98 99 def __init__(self, truncate_a: bool = True) -> None: 100 self.truncate_a = truncate_a 101 102 # Fitted attributes - set by .fit() 103 self._mu_hat: Optional[float] = None 104 self._v_hat: Optional[float] = None 105 self._a_hat: Optional[float] = None 106 self._k: Optional[float] = None 107 self._z: Optional[pl.DataFrame] = None 108 self._premiums: Optional[pl.DataFrame] = None 109 self._fitted = False 110 111 # ------------------------------------------------------------------ 112 # Scikit-learn style fit 113 # ------------------------------------------------------------------ 114 115 def fit( 116 self, 117 data: Union[pl.DataFrame, "pd.DataFrame"], # type: ignore[name-defined] 118 group_col: str = "group", 119 period_col: str = "period", 120 loss_col: str = "loss", 121 weight_col: str = "weight", 122 ) -> "BuhlmannStraub": 123 """ 124 Estimate structural parameters and credibility factors from panel data. 125 126 Parameters 127 ---------- 128 data: 129 A Polars DataFrame (preferred) or pandas DataFrame (converted 130 internally) with one row per (group, period). Columns need not be 131 in any particular order. 132 group_col: 133 Column identifying the group (e.g. scheme ID, territory, NCD class). 134 period_col: 135 Column identifying the time period (e.g. accident year). Used only 136 to count periods per group; the ordering does not matter. 137 loss_col: 138 Column containing the loss rate or loss ratio. This should be losses 139 per unit of exposure - not total losses. If you have total losses, 140 divide by the exposure column before fitting. 141 weight_col: 142 Column containing the exposure weight (e.g. earned car years, 143 earned premium, policy count). Must be strictly positive. 144 145 Returns 146 ------- 147 self 148 Returns the fitted estimator so calls can be chained. 149 """ 150 data = _to_polars(data) 151 152 validate_panel_data(data, group_col, period_col, loss_col, weight_col) 153 check_duplicate_periods(data, group_col, period_col) 154 155 self._group_col = group_col 156 self._period_col = period_col 157 self._loss_col = loss_col 158 self._weight_col = weight_col 159 160 # Build group-level summary: w_i, X_bar_i, T_i 161 groups = self._build_group_summary(data, group_col, period_col, loss_col, weight_col) 162 self._groups = groups 163 164 # Estimate structural parameters 165 mu_hat, v_hat, a_hat_raw = self._estimate_structural_params( 166 data, groups, group_col, period_col, loss_col, weight_col 167 ) 168 169 # Handle negative a_hat 170 if a_hat_raw <= 0: 171 if self.truncate_a: 172 warnings.warn( 173 f"Between-group variance estimate a_hat = {a_hat_raw:.6g} ≤ 0. " 174 "Truncating to zero. This means the model finds no detectable " 175 "heterogeneity between groups - all groups will receive the " 176 "collective mean as their credibility premium (Z_i = 0). " 177 "Consider whether your data genuinely lacks between-group " 178 "variation, or whether you have too few groups to estimate a reliably.", 179 stacklevel=2, 180 ) 181 a_hat = 0.0 182 k = np.inf 183 else: 184 raise ValueError( 185 f"Between-group variance estimate a_hat = {a_hat_raw:.6g} ≤ 0. " 186 "Set truncate_a=True to handle this automatically." 187 ) 188 else: 189 a_hat = a_hat_raw 190 k = v_hat / a_hat 191 192 # Credibility factors 193 w = groups["w_i"].to_numpy() 194 if np.isinf(k): 195 z_values = np.zeros(len(groups)) 196 else: 197 z_values = w / (w + k) 198 199 group_ids = groups["group"].to_list() 200 x_bar = groups["x_bar_i"].to_numpy() 201 202 # z_ is a Polars DataFrame with columns ["group", "Z"] 203 z_df = pl.DataFrame({ 204 "group": group_ids, 205 "Z": z_values, 206 }) 207 208 # premiums_ is a Polars DataFrame with a "group" column (no index) 209 premiums = pl.DataFrame({ 210 "group": group_ids, 211 "exposure": w, 212 "observed_mean": x_bar, 213 "Z": z_values, 214 "credibility_premium": z_values * x_bar + (1 - z_values) * mu_hat, 215 "complement": np.full(len(groups), mu_hat), 216 }) 217 218 self._mu_hat = float(mu_hat) 219 self._v_hat = float(v_hat) 220 self._a_hat = float(a_hat) 221 self._a_hat_raw = float(a_hat_raw) 222 self._k = float(k) if not np.isinf(k) else np.inf 223 self._z = z_df 224 self._premiums = premiums 225 self._fitted = True 226 227 return self 228 229 # ------------------------------------------------------------------ 230 # Structural parameter properties 231 # ------------------------------------------------------------------ 232 233 @property 234 def mu_hat_(self) -> float: 235 """ 236 Collective mean - the grand weighted average loss rate across all groups. 237 238 This is the complement: what every group's credibility premium regresses 239 toward as exposure shrinks. 240 """ 241 self._check_fitted() 242 return self._mu_hat # type: ignore[return-value] 243 244 @property 245 def v_hat_(self) -> float: 246 """ 247 EPV - Expected value of Process Variance. 248 249 Measures within-group year-to-year volatility, averaged across groups. 250 High v_hat means individual group experience is noisy - lean on the 251 collective mean. 252 """ 253 self._check_fitted() 254 return self._v_hat # type: ignore[return-value] 255 256 @property 257 def a_hat_(self) -> float: 258 """ 259 VHM - Variance of Hypothetical Means. 260 261 Measures how much the true underlying loss rates differ between groups. 262 High a_hat means groups are genuinely heterogeneous - trust their own 263 experience. Truncated at zero if truncate_a=True. 264 """ 265 self._check_fitted() 266 return self._a_hat # type: ignore[return-value] 267 268 @property 269 def k_(self) -> float: 270 """ 271 Bühlmann's k - the credibility parameter. 272 273 k = v / a. This is the exposure a group needs to achieve Z = 0.5 (equal 274 weight on own experience and collective mean). A group with exposure w_i 275 has Z_i = w_i / (w_i + k). 276 277 Returns np.inf when a_hat is zero (all groups get the collective mean). 278 """ 279 self._check_fitted() 280 return self._k # type: ignore[return-value] 281 282 @property 283 def z_(self) -> pl.DataFrame: 284 """ 285 Credibility factors - a Polars DataFrame with columns ["group", "Z"]. 286 287 Z_i = w_i / (w_i + k), ranging from 0 (no credibility, use collective 288 mean) to 1 (full credibility, use group's own experience). 289 290 To look up a specific group:: 291 292 bs.z_.filter(pl.col("group") == "Motor Guild")["Z"][0] 293 294 To find the group with the highest Z:: 295 296 bs.z_.sort("Z", descending=True)["group"][0] 297 """ 298 self._check_fitted() 299 return self._z # type: ignore[return-value] 300 301 @property 302 def premiums_(self) -> pl.DataFrame: 303 """ 304 Credibility premiums - one row per group, as a Polars DataFrame. 305 306 Columns: 307 308 - ``group``: group identifier 309 - ``exposure``: total exposure weight for the group 310 - ``observed_mean``: exposure-weighted average loss rate 311 - ``Z``: credibility factor 312 - ``credibility_premium``: Z * observed_mean + (1 - Z) * mu_hat 313 - ``complement``: mu_hat (the collective mean complement) 314 """ 315 self._check_fitted() 316 return self._premiums # type: ignore[return-value] 317 318 # ------------------------------------------------------------------ 319 # Output 320 # ------------------------------------------------------------------ 321 322 def summary(self) -> pl.DataFrame: 323 """ 324 Return a formatted summary of the fit. 325 326 The table shows structural parameters at the top and per-group credibility 327 results below, formatted for an actuarial audience. 328 329 Returns 330 ------- 331 pl.DataFrame 332 Per-group results. Structural parameters are printed to stdout. 333 """ 334 self._check_fitted() 335 336 print("Bühlmann-Straub Credibility Model") 337 print("=" * 42) 338 print(f" Collective mean mu = {self._mu_hat:.6g}") 339 print(f" Process variance v = {self._v_hat:.6g} (EPV, within-group)") 340 print(f" Between-group var a = {self._a_hat:.6g} (VHM, between-group)") 341 if np.isinf(self._k): 342 print(f" Credibility param k = inf (a ≤ 0, all Z = 0)") 343 else: 344 print(f" Credibility param k = {self._k:.6g} (v / a)") 345 print() 346 print(" Interpretation: a group needs exposure = k to achieve Z = 0.50") 347 if self._a_hat_raw <= 0: 348 print(f" (raw a_hat before truncation: {self._a_hat_raw:.6g})") 349 print() 350 351 tbl = self._premiums.rename({ 352 "exposure": "Exposure", 353 "observed_mean": "Obs. Mean", 354 "Z": "Z", 355 "credibility_premium": "Cred. Premium", 356 "complement": "Complement", 357 }) 358 return tbl 359 360 # ------------------------------------------------------------------ 361 # Internal helpers 362 # ------------------------------------------------------------------ 363 364 @staticmethod 365 def _build_group_summary( 366 data: pl.DataFrame, 367 group_col: str, 368 period_col: str, 369 loss_col: str, 370 weight_col: str, 371 ) -> pl.DataFrame: 372 """ 373 Compute per-group summary statistics needed for parameter estimation. 374 375 Returns a Polars DataFrame with columns: 376 group : group identifier (same type as group_col) 377 w_i : total exposure (sum of weights) 378 x_bar_i : exposure-weighted mean loss rate 379 T_i : number of periods 380 """ 381 groups = ( 382 data.group_by(group_col) 383 .agg([ 384 pl.col(weight_col).sum().alias("w_i"), 385 ( 386 (pl.col(weight_col) * pl.col(loss_col)).sum() 387 / pl.col(weight_col).sum() 388 ).alias("x_bar_i"), 389 pl.len().alias("T_i"), 390 ]) 391 .rename({group_col: "group"}) 392 .sort("group") 393 ) 394 return groups 395 396 @staticmethod 397 def _estimate_structural_params( 398 data: pl.DataFrame, 399 groups: pl.DataFrame, 400 group_col: str, 401 period_col: str, 402 loss_col: str, 403 weight_col: str, 404 ) -> tuple[float, float, float]: 405 """ 406 Compute unbiased estimates of mu, v, and a. 407 408 These are the standard Bühlmann-Straub (1970) non-parametric estimators, 409 as presented in Bühlmann & Gisler (2005), Chapter 4. 410 411 Parameters 412 ---------- 413 data: 414 Full panel data (Polars DataFrame). 415 groups: 416 Per-group summary from _build_group_summary. 417 418 Returns 419 ------- 420 mu_hat, v_hat, a_hat_raw 421 a_hat_raw may be negative - the caller decides how to handle it. 422 """ 423 w = groups["w_i"].to_numpy() 424 x_bar = groups["x_bar_i"].to_numpy() 425 T = groups["T_i"].to_numpy() 426 group_ids = groups["group"].to_list() 427 r = len(groups) 428 429 # Collective mean: grand weighted average 430 w_total = w.sum() 431 mu_hat = (w * x_bar).sum() / w_total 432 433 # v_hat: within-group weighted mean squared deviation 434 # v_hat = (1 / sum_i(T_i - 1)) * sum_i sum_j [ w_{ij} * (X_{ij} - X_bar_i)^2 ] 435 denom_v = (T - 1).sum() # degrees of freedom for v 436 437 # Build a lookup from group id → x_bar_i for the within-group deviation 438 x_bar_map = dict(zip(group_ids, x_bar)) 439 440 # Join x_bar_i into the full data, then compute the weighted squared deviation 441 # We use a join rather than a Python loop for efficiency 442 x_bar_lf = pl.DataFrame({ 443 group_col: group_ids, 444 "_x_bar_i": x_bar, 445 }).lazy() 446 447 numerator_v = ( 448 data.lazy() 449 .join(x_bar_lf, on=group_col, how="left") 450 .with_columns( 451 ( 452 pl.col(weight_col) * (pl.col(loss_col) - pl.col("_x_bar_i")) ** 2 453 ).alias("_sq_dev") 454 ) 455 .select(pl.col("_sq_dev").sum()) 456 .collect() 457 ["_sq_dev"][0] 458 ) 459 460 v_hat = float(numerator_v) / denom_v if denom_v > 0 else 0.0 461 462 # a_hat: between-group variance 463 # c = w_total - sum_i(w_i^2) / w_total 464 # s2 = sum_i [ w_i * (x_bar_i - mu_hat)^2 ] 465 # a_hat = (s2 - (r - 1) * v_hat) / c 466 c = w_total - (w ** 2).sum() / w_total 467 s2 = (w * (x_bar - mu_hat) ** 2).sum() 468 a_hat_raw = (s2 - (r - 1) * v_hat) / c 469 470 return float(mu_hat), float(v_hat), float(a_hat_raw) 471 472 def _check_fitted(self) -> None: 473 if not self._fitted: 474 raise RuntimeError( 475 "Model has not been fitted. Call .fit() first." 476 ) 477 478 def __repr__(self) -> str: 479 if not self._fitted: 480 return "BuhlmannStraub(not fitted)" 481 return ( 482 f"BuhlmannStraub(" 483 f"mu={self._mu_hat:.4g}, " 484 f"v={self._v_hat:.4g}, " 485 f"a={self._a_hat:.4g}, " 486 f"k={self._k:.4g})" 487 )
Bühlmann-Straub credibility model with non-parametric structural parameter estimation.
Fit the model to a panel dataset - one row per (group, period) - where each row has a loss rate and an exposure weight. The model estimates the structural parameters v (within-group variance) and a (between-group variance) from the data itself, then computes a credibility factor Z_i for each group.
The result is a credibility premium for each group that blends its own weighted-average loss rate with the portfolio mean, with the blend governed entirely by the ratio K = v/a and the group's total exposure.
Parameters
truncate_a : bool, default True If True (the default), truncate a_hat at zero when the between-group variance estimate is negative. This sets K → ∞ and Z_i → 0 for all groups, so every group gets the collective mean. If False, a negative a_hat raises an error instead.
Examples
>>> import polars as pl
>>> from credibility import BuhlmannStraub
>>>
>>> df = pl.DataFrame({
... "scheme": ["A", "A", "A", "B", "B", "B", "C", "C", "C"],
... "year": [2021, 2022, 2023] * 3,
... "loss_rate": [0.55, 0.60, 0.58, 0.80, 0.75, 0.82, 0.40, 0.42, 0.38],
... "exposure": [1000, 1200, 1100, 300, 350, 320, 5000, 4800, 5200],
... })
>>> bs = BuhlmannStraub()
>>> bs.fit(df, group_col="scheme", period_col="year",
... loss_col="loss_rate", weight_col="exposure")
>>> bs.summary()
99 def __init__(self, truncate_a: bool = True) -> None: 100 self.truncate_a = truncate_a 101 102 # Fitted attributes - set by .fit() 103 self._mu_hat: Optional[float] = None 104 self._v_hat: Optional[float] = None 105 self._a_hat: Optional[float] = None 106 self._k: Optional[float] = None 107 self._z: Optional[pl.DataFrame] = None 108 self._premiums: Optional[pl.DataFrame] = None 109 self._fitted = False
115 def fit( 116 self, 117 data: Union[pl.DataFrame, "pd.DataFrame"], # type: ignore[name-defined] 118 group_col: str = "group", 119 period_col: str = "period", 120 loss_col: str = "loss", 121 weight_col: str = "weight", 122 ) -> "BuhlmannStraub": 123 """ 124 Estimate structural parameters and credibility factors from panel data. 125 126 Parameters 127 ---------- 128 data: 129 A Polars DataFrame (preferred) or pandas DataFrame (converted 130 internally) with one row per (group, period). Columns need not be 131 in any particular order. 132 group_col: 133 Column identifying the group (e.g. scheme ID, territory, NCD class). 134 period_col: 135 Column identifying the time period (e.g. accident year). Used only 136 to count periods per group; the ordering does not matter. 137 loss_col: 138 Column containing the loss rate or loss ratio. This should be losses 139 per unit of exposure - not total losses. If you have total losses, 140 divide by the exposure column before fitting. 141 weight_col: 142 Column containing the exposure weight (e.g. earned car years, 143 earned premium, policy count). Must be strictly positive. 144 145 Returns 146 ------- 147 self 148 Returns the fitted estimator so calls can be chained. 149 """ 150 data = _to_polars(data) 151 152 validate_panel_data(data, group_col, period_col, loss_col, weight_col) 153 check_duplicate_periods(data, group_col, period_col) 154 155 self._group_col = group_col 156 self._period_col = period_col 157 self._loss_col = loss_col 158 self._weight_col = weight_col 159 160 # Build group-level summary: w_i, X_bar_i, T_i 161 groups = self._build_group_summary(data, group_col, period_col, loss_col, weight_col) 162 self._groups = groups 163 164 # Estimate structural parameters 165 mu_hat, v_hat, a_hat_raw = self._estimate_structural_params( 166 data, groups, group_col, period_col, loss_col, weight_col 167 ) 168 169 # Handle negative a_hat 170 if a_hat_raw <= 0: 171 if self.truncate_a: 172 warnings.warn( 173 f"Between-group variance estimate a_hat = {a_hat_raw:.6g} ≤ 0. " 174 "Truncating to zero. This means the model finds no detectable " 175 "heterogeneity between groups - all groups will receive the " 176 "collective mean as their credibility premium (Z_i = 0). " 177 "Consider whether your data genuinely lacks between-group " 178 "variation, or whether you have too few groups to estimate a reliably.", 179 stacklevel=2, 180 ) 181 a_hat = 0.0 182 k = np.inf 183 else: 184 raise ValueError( 185 f"Between-group variance estimate a_hat = {a_hat_raw:.6g} ≤ 0. " 186 "Set truncate_a=True to handle this automatically." 187 ) 188 else: 189 a_hat = a_hat_raw 190 k = v_hat / a_hat 191 192 # Credibility factors 193 w = groups["w_i"].to_numpy() 194 if np.isinf(k): 195 z_values = np.zeros(len(groups)) 196 else: 197 z_values = w / (w + k) 198 199 group_ids = groups["group"].to_list() 200 x_bar = groups["x_bar_i"].to_numpy() 201 202 # z_ is a Polars DataFrame with columns ["group", "Z"] 203 z_df = pl.DataFrame({ 204 "group": group_ids, 205 "Z": z_values, 206 }) 207 208 # premiums_ is a Polars DataFrame with a "group" column (no index) 209 premiums = pl.DataFrame({ 210 "group": group_ids, 211 "exposure": w, 212 "observed_mean": x_bar, 213 "Z": z_values, 214 "credibility_premium": z_values * x_bar + (1 - z_values) * mu_hat, 215 "complement": np.full(len(groups), mu_hat), 216 }) 217 218 self._mu_hat = float(mu_hat) 219 self._v_hat = float(v_hat) 220 self._a_hat = float(a_hat) 221 self._a_hat_raw = float(a_hat_raw) 222 self._k = float(k) if not np.isinf(k) else np.inf 223 self._z = z_df 224 self._premiums = premiums 225 self._fitted = True 226 227 return self
Estimate structural parameters and credibility factors from panel data.
Parameters
data: A Polars DataFrame (preferred) or pandas DataFrame (converted internally) with one row per (group, period). Columns need not be in any particular order. group_col: Column identifying the group (e.g. scheme ID, territory, NCD class). period_col: Column identifying the time period (e.g. accident year). Used only to count periods per group; the ordering does not matter. loss_col: Column containing the loss rate or loss ratio. This should be losses per unit of exposure - not total losses. If you have total losses, divide by the exposure column before fitting. weight_col: Column containing the exposure weight (e.g. earned car years, earned premium, policy count). Must be strictly positive.
Returns
self Returns the fitted estimator so calls can be chained.
233 @property 234 def mu_hat_(self) -> float: 235 """ 236 Collective mean - the grand weighted average loss rate across all groups. 237 238 This is the complement: what every group's credibility premium regresses 239 toward as exposure shrinks. 240 """ 241 self._check_fitted() 242 return self._mu_hat # type: ignore[return-value]
Collective mean - the grand weighted average loss rate across all groups.
This is the complement: what every group's credibility premium regresses toward as exposure shrinks.
244 @property 245 def v_hat_(self) -> float: 246 """ 247 EPV - Expected value of Process Variance. 248 249 Measures within-group year-to-year volatility, averaged across groups. 250 High v_hat means individual group experience is noisy - lean on the 251 collective mean. 252 """ 253 self._check_fitted() 254 return self._v_hat # type: ignore[return-value]
EPV - Expected value of Process Variance.
Measures within-group year-to-year volatility, averaged across groups. High v_hat means individual group experience is noisy - lean on the collective mean.
256 @property 257 def a_hat_(self) -> float: 258 """ 259 VHM - Variance of Hypothetical Means. 260 261 Measures how much the true underlying loss rates differ between groups. 262 High a_hat means groups are genuinely heterogeneous - trust their own 263 experience. Truncated at zero if truncate_a=True. 264 """ 265 self._check_fitted() 266 return self._a_hat # type: ignore[return-value]
VHM - Variance of Hypothetical Means.
Measures how much the true underlying loss rates differ between groups. High a_hat means groups are genuinely heterogeneous - trust their own experience. Truncated at zero if truncate_a=True.
268 @property 269 def k_(self) -> float: 270 """ 271 Bühlmann's k - the credibility parameter. 272 273 k = v / a. This is the exposure a group needs to achieve Z = 0.5 (equal 274 weight on own experience and collective mean). A group with exposure w_i 275 has Z_i = w_i / (w_i + k). 276 277 Returns np.inf when a_hat is zero (all groups get the collective mean). 278 """ 279 self._check_fitted() 280 return self._k # type: ignore[return-value]
Bühlmann's k - the credibility parameter.
k = v / a. This is the exposure a group needs to achieve Z = 0.5 (equal weight on own experience and collective mean). A group with exposure w_i has Z_i = w_i / (w_i + k).
Returns np.inf when a_hat is zero (all groups get the collective mean).
282 @property 283 def z_(self) -> pl.DataFrame: 284 """ 285 Credibility factors - a Polars DataFrame with columns ["group", "Z"]. 286 287 Z_i = w_i / (w_i + k), ranging from 0 (no credibility, use collective 288 mean) to 1 (full credibility, use group's own experience). 289 290 To look up a specific group:: 291 292 bs.z_.filter(pl.col("group") == "Motor Guild")["Z"][0] 293 294 To find the group with the highest Z:: 295 296 bs.z_.sort("Z", descending=True)["group"][0] 297 """ 298 self._check_fitted() 299 return self._z # type: ignore[return-value]
Credibility factors - a Polars DataFrame with columns ["group", "Z"].
Z_i = w_i / (w_i + k), ranging from 0 (no credibility, use collective mean) to 1 (full credibility, use group's own experience).
To look up a specific group::
bs.z_.filter(pl.col("group") == "Motor Guild")["Z"][0]
To find the group with the highest Z::
bs.z_.sort("Z", descending=True)["group"][0]
322 def summary(self) -> pl.DataFrame: 323 """ 324 Return a formatted summary of the fit. 325 326 The table shows structural parameters at the top and per-group credibility 327 results below, formatted for an actuarial audience. 328 329 Returns 330 ------- 331 pl.DataFrame 332 Per-group results. Structural parameters are printed to stdout. 333 """ 334 self._check_fitted() 335 336 print("Bühlmann-Straub Credibility Model") 337 print("=" * 42) 338 print(f" Collective mean mu = {self._mu_hat:.6g}") 339 print(f" Process variance v = {self._v_hat:.6g} (EPV, within-group)") 340 print(f" Between-group var a = {self._a_hat:.6g} (VHM, between-group)") 341 if np.isinf(self._k): 342 print(f" Credibility param k = inf (a ≤ 0, all Z = 0)") 343 else: 344 print(f" Credibility param k = {self._k:.6g} (v / a)") 345 print() 346 print(" Interpretation: a group needs exposure = k to achieve Z = 0.50") 347 if self._a_hat_raw <= 0: 348 print(f" (raw a_hat before truncation: {self._a_hat_raw:.6g})") 349 print() 350 351 tbl = self._premiums.rename({ 352 "exposure": "Exposure", 353 "observed_mean": "Obs. Mean", 354 "Z": "Z", 355 "credibility_premium": "Cred. Premium", 356 "complement": "Complement", 357 }) 358 return tbl
Return a formatted summary of the fit.
The table shows structural parameters at the top and per-group credibility results below, formatted for an actuarial audience.
Returns
pl.DataFrame Per-group results. Structural parameters are printed to stdout.
102class HierarchicalBuhlmannStraub: 103 """ 104 Multi-level Bühlmann-Straub credibility model for nested group structures. 105 106 This model is appropriate when groups are arranged in a strict hierarchy - 107 every observation belongs to exactly one parent at each level. Typical uses: 108 109 - Geographic: postcode sector → district → area 110 - Organisational: risk → scheme → portfolio 111 - Time: accident quarter → accident year → underwriting year 112 113 Parameters 114 ---------- 115 level_cols: 116 List of column names defining the hierarchy from outermost (top) to 117 innermost (bottom). For a three-level model, provide three names. 118 Example: ``["area", "district", "sector"]``. 119 truncate_a: 120 Whether to truncate negative between-group variance estimates at zero. 121 Passed through to the BuhlmannStraub model at each level. 122 123 Examples 124 -------- 125 Geographic three-level model:: 126 127 >>> model = HierarchicalBuhlmannStraub( 128 ... level_cols=["region", "district", "sector"] 129 ... ) 130 >>> model.fit( 131 ... data=df, 132 ... period_col="year", 133 ... loss_col="loss_rate", 134 ... weight_col="exposure", 135 ... ) 136 >>> model.premiums_at("sector") # blended sector-level premiums 137 >>> model.level_results_["district"].k # k at district level 138 139 Two-level scheme model (equivalent to BuhlmannStraub with a book-level prior):: 140 141 >>> model = HierarchicalBuhlmannStraub(level_cols=["book", "scheme"]) 142 >>> model.fit(df, period_col="year", loss_col="loss_rate", weight_col="exposure") 143 """ 144 145 def __init__( 146 self, 147 level_cols: List[str], 148 truncate_a: bool = True, 149 ) -> None: 150 if len(level_cols) < 2: 151 raise ValueError( 152 "At least two levels are required for a hierarchical model. " 153 "For a single-level model, use BuhlmannStraub." 154 ) 155 self.level_cols = level_cols 156 self.truncate_a = truncate_a 157 self._fitted = False 158 self._level_results: Optional[Dict[str, LevelResult]] = None 159 self._bottom_premiums: Optional[pl.DataFrame] = None 160 161 def fit( 162 self, 163 data: Union[pl.DataFrame, "pd.DataFrame"], # type: ignore[name-defined] 164 period_col: str = "period", 165 loss_col: str = "loss", 166 weight_col: str = "weight", 167 ) -> "HierarchicalBuhlmannStraub": 168 """ 169 Fit the hierarchical model to panel data. 170 171 Parameters 172 ---------- 173 data: 174 Panel DataFrame (Polars preferred, pandas accepted) with one row 175 per (leaf node, period). Must contain all columns in ``level_cols`` 176 plus ``period_col``, ``loss_col``, and ``weight_col``. 177 period_col: 178 Column identifying the time period. 179 loss_col: 180 Column containing the per-unit loss rate. 181 weight_col: 182 Column containing exposure weights (must be strictly positive). 183 184 Returns 185 ------- 186 self 187 """ 188 data = _to_polars(data) 189 190 required_cols = set(self.level_cols) | {period_col, loss_col, weight_col} 191 missing = required_cols - set(data.columns) 192 if missing: 193 raise ValueError(f"Columns not found in data: {missing}") 194 195 self._period_col = period_col 196 self._loss_col = loss_col 197 self._weight_col = weight_col 198 199 # Validate that hierarchy is strict (no cross-level ambiguity) 200 self._validate_hierarchy(data) 201 202 # Bottom-up: estimate variance components at each level 203 level_results: Dict[str, LevelResult] = {} 204 bottom_level = self.level_cols[-1] 205 206 # At the innermost level, fit BuhlmannStraub with the leaf node as group 207 # and each period as an observation 208 innermost_fit = self._fit_innermost_level(data, bottom_level, period_col, loss_col, weight_col) 209 level_results[bottom_level] = innermost_fit 210 211 # For each higher level, aggregate upward and fit B-S treating each node 212 # at that level as a "group" with its children's credibility premiums as 213 # "observations" (but we use the raw aggregated data for cleaner estimation) 214 for depth in range(len(self.level_cols) - 2, -1, -1): 215 parent_col = self.level_cols[depth] 216 child_col = self.level_cols[depth + 1] 217 level_result = self._fit_upper_level( 218 data, 219 parent_col, 220 child_col, 221 loss_col, 222 weight_col, 223 lower_v=level_results[child_col].v_hat, 224 ) 225 level_results[parent_col] = level_result 226 227 self._level_results = level_results 228 229 # Top-down: compute final blended premiums at each level 230 self._bottom_premiums = self._compute_top_down_premiums(data, level_results) 231 self._fitted = True 232 return self 233 234 # ------------------------------------------------------------------ 235 # Output 236 # ------------------------------------------------------------------ 237 238 @property 239 def level_results_(self) -> Dict[str, LevelResult]: 240 """ 241 Dictionary mapping level name to LevelResult with fitted parameters. 242 """ 243 self._check_fitted() 244 return self._level_results # type: ignore[return-value] 245 246 @property 247 def premiums_(self) -> pl.DataFrame: 248 """ 249 Credibility premiums at the bottom (leaf) level of the hierarchy. 250 251 Equivalent to ``premiums_at(self.level_cols[-1])``. 252 """ 253 self._check_fitted() 254 return self._bottom_premiums # type: ignore[return-value] 255 256 def premiums_at(self, level: str) -> pl.DataFrame: 257 """ 258 Return the credibility premiums at a specified level of the hierarchy. 259 260 Parameters 261 ---------- 262 level: 263 One of the column names from ``level_cols``. 264 265 Returns 266 ------- 267 pl.DataFrame 268 With a ``group`` column plus ``exposure``, ``observed_mean``, 269 ``Z``, ``credibility_premium``, and ``complement`` columns. 270 """ 271 self._check_fitted() 272 if level not in self.level_cols: 273 raise ValueError( 274 f"'{level}' is not one of the fitted levels: {self.level_cols}" 275 ) 276 return self._level_results[level].premiums 277 278 def summary(self) -> None: 279 """Print structural parameters at each level.""" 280 self._check_fitted() 281 print("Hierarchical Bühlmann-Straub Credibility Model") 282 print("=" * 50) 283 print(f"Levels (outer → inner): {' → '.join(self.level_cols)}") 284 print() 285 for level in self.level_cols: 286 lr = self._level_results[level] 287 print(f" Level: {level}") 288 print(f" mu = {lr.mu_hat:.6g}") 289 print(f" v = {lr.v_hat:.6g} (EPV within-node)") 290 print(f" a = {lr.a_hat:.6g} (VHM between-node)") 291 k_str = "inf" if np.isinf(lr.k) else f"{lr.k:.6g}" 292 print(f" k = {k_str}") 293 print() 294 295 # ------------------------------------------------------------------ 296 # Internal helpers 297 # ------------------------------------------------------------------ 298 299 def _validate_hierarchy(self, data: pl.DataFrame) -> None: 300 """Check that the hierarchy is strict - each child belongs to exactly one parent.""" 301 for depth in range(len(self.level_cols) - 1): 302 parent = self.level_cols[depth] 303 child = self.level_cols[depth + 1] 304 parents_per_child = ( 305 data.select([parent, child]) 306 .unique() 307 .group_by(child) 308 .agg(pl.col(parent).n_unique().alias("n_parents")) 309 ) 310 ambiguous = parents_per_child.filter(pl.col("n_parents") > 1) 311 if ambiguous.height > 0: 312 examples = ambiguous[child].to_list()[:3] 313 raise ValueError( 314 f"Hierarchy is not strict at level '{child}' → '{parent}': " 315 f"{ambiguous.height} child node(s) appear under multiple parents. " 316 f"Examples: {examples}" 317 ) 318 319 def _fit_innermost_level( 320 self, 321 data: pl.DataFrame, 322 leaf_col: str, 323 period_col: str, 324 loss_col: str, 325 weight_col: str, 326 ) -> LevelResult: 327 """Fit B-S at the innermost level using raw period data.""" 328 bs = BuhlmannStraub(truncate_a=self.truncate_a) 329 bs.fit( 330 data, 331 group_col=leaf_col, 332 period_col=period_col, 333 loss_col=loss_col, 334 weight_col=weight_col, 335 ) 336 return LevelResult( 337 level_name=leaf_col, 338 mu_hat=bs.mu_hat_, 339 v_hat=bs.v_hat_, 340 a_hat=bs.a_hat_, 341 k=bs.k_, 342 z=bs.z_, 343 premiums=bs.premiums_, 344 ) 345 346 def _fit_upper_level( 347 self, 348 data: pl.DataFrame, 349 parent_col: str, 350 child_col: str, 351 loss_col: str, 352 weight_col: str, 353 lower_v: float, 354 ) -> LevelResult: 355 """ 356 Fit B-S at a higher level by aggregating child-level data. 357 358 We treat each child node as a "period" within the parent node, 359 using the child's total exposure as weight and its weighted mean loss 360 rate as the observation. The within-parent variance v at this level 361 is the between-child variance a from the level below. 362 363 This is the Jewell (1975) / actuar-compatible approach: v at level L 364 equals a at level L+1, propagating variance components up the hierarchy. 365 """ 366 # Aggregate data to child level (summing over periods) 367 child_summary = ( 368 data.select([parent_col, child_col, weight_col, loss_col]) 369 .group_by([parent_col, child_col]) 370 .agg([ 371 pl.col(weight_col).sum().alias("exposure"), 372 ( 373 (pl.col(weight_col) * pl.col(loss_col)).sum() 374 / pl.col(weight_col).sum() 375 ).alias("loss_rate"), 376 ]) 377 ) 378 379 # Fit B-S at parent level, treating each child as one "observation" 380 # The period col here is child_col, loss col is loss_rate, weight is exposure 381 bs = BuhlmannStraub(truncate_a=self.truncate_a) 382 bs.fit( 383 child_summary, 384 group_col=parent_col, 385 period_col=child_col, 386 loss_col="loss_rate", 387 weight_col="exposure", 388 ) 389 390 return LevelResult( 391 level_name=parent_col, 392 mu_hat=bs.mu_hat_, 393 v_hat=bs.v_hat_, 394 a_hat=bs.a_hat_, 395 k=bs.k_, 396 z=bs.z_, 397 premiums=bs.premiums_, 398 ) 399 400 def _compute_top_down_premiums( 401 self, 402 data: pl.DataFrame, 403 level_results: Dict[str, LevelResult], 404 ) -> pl.DataFrame: 405 """ 406 Compute the final blended premiums by propagating the hierarchy top-down. 407 408 At the topmost level, the complement is the grand mean. At each level 409 below, the complement for a node is the credibility premium of its parent. 410 The final output is a premium for each leaf node. 411 412 Blending formula at each level: 413 P_node = Z_node * X̄_node + (1 - Z_node) * P_parent 414 """ 415 top_level = self.level_cols[0] 416 top_lr = level_results[top_level] 417 418 # Start: parent premiums map from group id → credibility_premium at top level 419 # Represented as a pl.DataFrame with [group, parent_premium] for joining 420 parent_premiums_df = top_lr.premiums.select([ 421 pl.col("group"), 422 pl.col("credibility_premium").alias("parent_premium"), 423 ]) 424 425 # Work downward, level by level 426 for depth in range(1, len(self.level_cols)): 427 current_level = self.level_cols[depth] 428 parent_level = self.level_cols[depth - 1] 429 current_lr = level_results[current_level] 430 431 # Map: child node id → parent node id (unique pairs) 432 child_to_parent = ( 433 data.select([current_level, parent_level]) 434 .unique() 435 .rename({current_level: "group", parent_level: "parent_id"}) 436 ) 437 438 # Join child→parent map with parent premiums to get P_parent for each child 439 child_with_parent_premium = ( 440 child_to_parent 441 .join( 442 parent_premiums_df.rename({"group": "parent_id"}), 443 on="parent_id", 444 how="left", 445 ) 446 ) 447 448 # Join current level premiums (Z, observed_mean) with parent premium 449 blended = ( 450 current_lr.premiums.select(["group", "Z", "observed_mean"]) 451 .join(child_with_parent_premium.select(["group", "parent_premium"]), on="group", how="left") 452 .with_columns( 453 ( 454 pl.col("Z") * pl.col("observed_mean") 455 + (1 - pl.col("Z")) * pl.col("parent_premium") 456 ).alias("blended_premium") 457 ) 458 .select(["group", "blended_premium"]) 459 .rename({"blended_premium": "parent_premium"}) 460 ) 461 462 parent_premiums_df = blended 463 464 # Build final DataFrame: start from bottom-level premiums, replace credibility_premium 465 bottom_level = self.level_cols[-1] 466 bottom_lr = level_results[bottom_level] 467 result = ( 468 bottom_lr.premiums 469 .drop("credibility_premium") 470 .join( 471 parent_premiums_df.rename({"parent_premium": "credibility_premium"}), 472 on="group", 473 how="left", 474 ) 475 ) 476 return result 477 478 def _check_fitted(self) -> None: 479 if not self._fitted: 480 raise RuntimeError( 481 "Model has not been fitted. Call .fit() first." 482 ) 483 484 def __repr__(self) -> str: 485 levels_str = " → ".join(self.level_cols) 486 if not self._fitted: 487 return f"HierarchicalBuhlmannStraub(levels=[{levels_str}], not fitted)" 488 return f"HierarchicalBuhlmannStraub(levels=[{levels_str}])"
Multi-level Bühlmann-Straub credibility model for nested group structures.
This model is appropriate when groups are arranged in a strict hierarchy - every observation belongs to exactly one parent at each level. Typical uses:
- Geographic: postcode sector → district → area
- Organisational: risk → scheme → portfolio
- Time: accident quarter → accident year → underwriting year
Parameters
level_cols:
List of column names defining the hierarchy from outermost (top) to
innermost (bottom). For a three-level model, provide three names.
Example: ["area", "district", "sector"].
truncate_a:
Whether to truncate negative between-group variance estimates at zero.
Passed through to the BuhlmannStraub model at each level.
Examples
Geographic three-level model::
>>> model = HierarchicalBuhlmannStraub(
... level_cols=["region", "district", "sector"]
... )
>>> model.fit(
... data=df,
... period_col="year",
... loss_col="loss_rate",
... weight_col="exposure",
... )
>>> model.premiums_at("sector") # blended sector-level premiums
>>> model.level_results_["district"].k # k at district level
Two-level scheme model (equivalent to BuhlmannStraub with a book-level prior)::
>>> model = HierarchicalBuhlmannStraub(level_cols=["book", "scheme"])
>>> model.fit(df, period_col="year", loss_col="loss_rate", weight_col="exposure")
145 def __init__( 146 self, 147 level_cols: List[str], 148 truncate_a: bool = True, 149 ) -> None: 150 if len(level_cols) < 2: 151 raise ValueError( 152 "At least two levels are required for a hierarchical model. " 153 "For a single-level model, use BuhlmannStraub." 154 ) 155 self.level_cols = level_cols 156 self.truncate_a = truncate_a 157 self._fitted = False 158 self._level_results: Optional[Dict[str, LevelResult]] = None 159 self._bottom_premiums: Optional[pl.DataFrame] = None
161 def fit( 162 self, 163 data: Union[pl.DataFrame, "pd.DataFrame"], # type: ignore[name-defined] 164 period_col: str = "period", 165 loss_col: str = "loss", 166 weight_col: str = "weight", 167 ) -> "HierarchicalBuhlmannStraub": 168 """ 169 Fit the hierarchical model to panel data. 170 171 Parameters 172 ---------- 173 data: 174 Panel DataFrame (Polars preferred, pandas accepted) with one row 175 per (leaf node, period). Must contain all columns in ``level_cols`` 176 plus ``period_col``, ``loss_col``, and ``weight_col``. 177 period_col: 178 Column identifying the time period. 179 loss_col: 180 Column containing the per-unit loss rate. 181 weight_col: 182 Column containing exposure weights (must be strictly positive). 183 184 Returns 185 ------- 186 self 187 """ 188 data = _to_polars(data) 189 190 required_cols = set(self.level_cols) | {period_col, loss_col, weight_col} 191 missing = required_cols - set(data.columns) 192 if missing: 193 raise ValueError(f"Columns not found in data: {missing}") 194 195 self._period_col = period_col 196 self._loss_col = loss_col 197 self._weight_col = weight_col 198 199 # Validate that hierarchy is strict (no cross-level ambiguity) 200 self._validate_hierarchy(data) 201 202 # Bottom-up: estimate variance components at each level 203 level_results: Dict[str, LevelResult] = {} 204 bottom_level = self.level_cols[-1] 205 206 # At the innermost level, fit BuhlmannStraub with the leaf node as group 207 # and each period as an observation 208 innermost_fit = self._fit_innermost_level(data, bottom_level, period_col, loss_col, weight_col) 209 level_results[bottom_level] = innermost_fit 210 211 # For each higher level, aggregate upward and fit B-S treating each node 212 # at that level as a "group" with its children's credibility premiums as 213 # "observations" (but we use the raw aggregated data for cleaner estimation) 214 for depth in range(len(self.level_cols) - 2, -1, -1): 215 parent_col = self.level_cols[depth] 216 child_col = self.level_cols[depth + 1] 217 level_result = self._fit_upper_level( 218 data, 219 parent_col, 220 child_col, 221 loss_col, 222 weight_col, 223 lower_v=level_results[child_col].v_hat, 224 ) 225 level_results[parent_col] = level_result 226 227 self._level_results = level_results 228 229 # Top-down: compute final blended premiums at each level 230 self._bottom_premiums = self._compute_top_down_premiums(data, level_results) 231 self._fitted = True 232 return self
Fit the hierarchical model to panel data.
Parameters
data:
Panel DataFrame (Polars preferred, pandas accepted) with one row
per (leaf node, period). Must contain all columns in level_cols
plus period_col, loss_col, and weight_col.
period_col:
Column identifying the time period.
loss_col:
Column containing the per-unit loss rate.
weight_col:
Column containing exposure weights (must be strictly positive).
Returns
self
238 @property 239 def level_results_(self) -> Dict[str, LevelResult]: 240 """ 241 Dictionary mapping level name to LevelResult with fitted parameters. 242 """ 243 self._check_fitted() 244 return self._level_results # type: ignore[return-value]
Dictionary mapping level name to LevelResult with fitted parameters.
278 def summary(self) -> None: 279 """Print structural parameters at each level.""" 280 self._check_fitted() 281 print("Hierarchical Bühlmann-Straub Credibility Model") 282 print("=" * 50) 283 print(f"Levels (outer → inner): {' → '.join(self.level_cols)}") 284 print() 285 for level in self.level_cols: 286 lr = self._level_results[level] 287 print(f" Level: {level}") 288 print(f" mu = {lr.mu_hat:.6g}") 289 print(f" v = {lr.v_hat:.6g} (EPV within-node)") 290 print(f" a = {lr.a_hat:.6g} (VHM between-node)") 291 k_str = "inf" if np.isinf(lr.k) else f"{lr.k:.6g}" 292 print(f" k = {k_str}") 293 print()
Print structural parameters at each level.
54class LevelResult: 55 """ 56 Fitted parameters and premiums for one level of the hierarchy. 57 58 Attributes 59 ---------- 60 level_name: 61 The column name that identifies this level (e.g. "district"). 62 mu_hat: 63 Collective mean at this level. 64 v_hat: 65 Within-node variance at this level (EPV). 66 a_hat: 67 Between-node variance at this level (VHM). Zero if truncated. 68 k: 69 Bühlmann's k = v / a at this level. 70 z: 71 Credibility factors as a pl.DataFrame with columns ["group", "Z"]. 72 premiums: 73 Credibility premiums as a pl.DataFrame with a "group" column. 74 """ 75 76 def __init__( 77 self, 78 level_name: str, 79 mu_hat: float, 80 v_hat: float, 81 a_hat: float, 82 k: float, 83 z: pl.DataFrame, 84 premiums: pl.DataFrame, 85 ) -> None: 86 self.level_name = level_name 87 self.mu_hat = mu_hat 88 self.v_hat = v_hat 89 self.a_hat = a_hat 90 self.k = k 91 self.z = z 92 self.premiums = premiums 93 94 def __repr__(self) -> str: 95 return ( 96 f"LevelResult(level='{self.level_name}', " 97 f"mu={self.mu_hat:.4g}, v={self.v_hat:.4g}, " 98 f"a={self.a_hat:.4g}, k={self.k:.4g})" 99 )
Fitted parameters and premiums for one level of the hierarchy.
Attributes
level_name: The column name that identifies this level (e.g. "district"). mu_hat: Collective mean at this level. v_hat: Within-node variance at this level (EPV). a_hat: Between-node variance at this level (VHM). Zero if truncated. k: Bühlmann's k = v / a at this level. z: Credibility factors as a pl.DataFrame with columns ["group", "Z"]. premiums: Credibility premiums as a pl.DataFrame with a "group" column.
76 def __init__( 77 self, 78 level_name: str, 79 mu_hat: float, 80 v_hat: float, 81 a_hat: float, 82 k: float, 83 z: pl.DataFrame, 84 premiums: pl.DataFrame, 85 ) -> None: 86 self.level_name = level_name 87 self.mu_hat = mu_hat 88 self.v_hat = v_hat 89 self.a_hat = a_hat 90 self.k = k 91 self.z = z 92 self.premiums = premiums