Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff

📅 2023-10-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of joint frequency-severity modeling in non-life actuarial science, this paper proposes the Composite Actuarial Neural Network (CANN), which integrates GLM/GBM baselines with deep neural network–based corrections and employs a multi-objective loss function to jointly optimize predictive accuracy and statistical calibration. We introduce a novel global surrogate modeling framework that interprets black-box neural network outputs as regulatory-compliant, GLM-form technical rate tables—ensuring both transparency and compliance. CANN supports end-to-end ingestion of heterogeneous features, including ZIP-code–level geospatial variables. Experiments on four real-world insurance datasets demonstrate that CANN consistently outperforms conventional GLMs and GBMs, achieving simultaneous improvements in prediction accuracy, probabilistic calibration, and business interpretability. Critically, it generates production-ready technical rate tables directly deployable in pricing systems.
📝 Abstract
Insurers usually turn to generalized linear models for modeling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). The CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network, and we explore their potential advantages in a frequency-severity setting. Model performance is evaluated not only on out-of-sample deviance but also using statistical and calibration performance criteria and managerial tools to get more nuanced insights. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.
Problem

Research questions and friction points this paper is trying to address.

Deep Learning
Insurance Pricing
Neural Networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Learning
Actuarial Neural Networks (CANN)
Autoencoders for Categorical Information
🔎 Similar Papers
No similar papers found.
F
Freek Holvoet
Faculty of Economics and Business, KU Leuven, Belgium; LRisk, Leuven Research Center on Insurance and Financial Risk Analysis, KU Leuven, Belgium
K
Katrien Antonio
Faculty of Economics and Business, KU Leuven, Belgium; Faculty of Economics and Business, University of Amsterdam, The Netherlands; LRisk, Leuven Research Center on Insurance and Financial Risk Analysis, KU Leuven, Belgium
R
Roel Henckaerts
Faculty of Economics and Business, KU Leuven, Belgium; LRisk, Leuven Research Center on Insurance and Financial Risk Analysis, KU Leuven, Belgium