Interval Estimation of Coefficients in Penalized Regression Models of Insurance Data

📅 2024-10-01

📈 Citations: 1

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Standard post-selection inference for penalized regression (e.g., Lasso) on insurance loss data—characterized by zero-inflation, semi-continuity, and Tweedie-distributed responses—suffers from substantial bias in large coefficients, poor nominal coverage of confidence intervals, and over-optimistic inference after variable selection. Method: We propose a bias-corrected, selective inference framework for the generalized linear model (GLM) family. Our approach constructs a corrected estimator via conditional likelihood conditioning on the selection event and derives conditionally valid confidence intervals with rigorous theoretical guarantees. Contribution/Results: Compared to conventional Lasso-based post-selection inference, our method substantially improves coverage accuracy and statistical power for key risk-factor coefficients under high-dimensional sparse settings, thereby mitigating spurious attribution. Empirical evaluation on real insurance datasets demonstrates robustness and interpretability, confirming its practical utility for actuarial risk modeling.

Technology Category

Application Category

📝 Abstract

The Tweedie exponential dispersion family is a popular choice among many to model insurance losses that consist of zero-inflated semicontinuous data. In such data, it is often important to obtain credibility (inference) of the most important features that describe the endogenous variables. Post-selection inference is the standard procedure in statistics to obtain confidence intervals of model parameters after performing a feature extraction procedure. For a linear model, the lasso estimate often has non-negligible estimation bias for large coefficients corresponding to exogenous variables. To have valid inference on those coefficients, it is necessary to correct the bias of the lasso estimate. Traditional statistical methods, such as hypothesis testing or standard confidence interval construction might lead to incorrect conclusions during post-selection, as they are generally too optimistic. Here we discuss a few methodologies for constructing confidence intervals of the coefficients after feature selection in the Generalized Linear Model (GLM) family with application to insurance data.

Problem

Research questions and friction points this paper is trying to address.

Bias correction for lasso estimates in insurance data

Valid inference on coefficients post-feature selection

Confidence interval construction in GLM for insurance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bias correction for lasso estimates in GLM

Confidence intervals post-feature selection in GLM

Tweedie model for zero-inflated insurance data

🔎 Similar Papers

Conformal prediction for frequency-severity modeling