A Structural Calibration Framework for Tweedie Gradient Boosting in Zero-Inflated, Heavy-Tailed Regression

Authors

  • Jia He Lee School of Mathematics, Actuarial & Quantitative Studies, Asia Pacific University of Technology & Innovation, 57000, Kuala Lumpur, Malaysia
  • Srividhya Gunalan School of Computing Science, KPR College of Arts Science and Research, 641407, Coimbatore, Tamil Nadu, India
  • Chee Nian Lee School of Mathematics, Actuarial & Quantitative Studies, Asia Pacific University of Technology & Innovation, 57000, Kuala Lumpur, Malaysia

Keywords:

Gradient boosting, regression calibration, pure premium prediction, tweedie distribution

Abstract

Accurate insurance pricing requires models that both rank risks effectively and produce well-calibrated loss estimates. Gradient boosting models trained with the Tweedie objective are widely used for pure premium modelling of insurance claim costs, particularly in health and other non-life portfolios. In highly skewed data with concentrated tail losses, these models often exhibit systematic miscalibration of aggregate and tail-level predictions despite strong discriminatory performance. This study investigates the structural sources of this behaviour, specifically the rigidity of the Tweedie mean-variance assumption in extreme skewness, and proposes a practical calibration framework for heavy-tailed insurance cost data. The empirical analysis is conducted in the context of U.S. medical insurance pricing, using person-level healthcare expenditure data from the Medical Expenditure Panel Survey (MEPS). The response variable represents annual aggregated insurer payments, reflecting pure premium estimation. We introduce PRISM, an additive correction architecture that combines a variance-stable regression model with a classifier-based certainty signal and a regularised meta-learner. Unlike global rescaling or monotonic post-processing, PRISM applies a localised residual adjustment that preserves ranking performance while improving absolute calibration. To evaluate calibration quality, we formalise the Root Mean Squared Calibration Error (RMSCE) and Mean Absolute Calibration Error (MACE) as bin-wise regression calibration diagnostics that summarise monetary miscalibration across the prediction range. Results show that PRISM consistently reduces exposure-weighted calibration errors and bias relative to standard Tweedie boosting and common post-hoc corrections, while maintaining comparable risk discrimination. Bootstrap confidence intervals confirm these improvements, indicating that the observed miscalibration under extreme skewness is primarily driven by structural modelling constraints.

Author Biographies

Jia He Lee, School of Mathematics, Actuarial & Quantitative Studies, Asia Pacific University of Technology & Innovation, 57000, Kuala Lumpur, Malaysia

lee.jia.he.contact@gmail.com

Srividhya Gunalan, School of Computing Science, KPR College of Arts Science and Research, 641407, Coimbatore, Tamil Nadu, India

sathyasenthil01@gmail.com

Chee Nian Lee, School of Mathematics, Actuarial & Quantitative Studies, Asia Pacific University of Technology & Innovation, 57000, Kuala Lumpur, Malaysia

lee.cheenian@apu.edu.my

Downloads

Published

2026-02-04

Issue

Section

Articles