Inductive inference of gradient-boosted decision trees on graphs for insurance fraud detection

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Addressing the challenges of severe class imbalance, network heterogeneity, and dynamic evolution in insurance fraud detection—factors that impede effective graph representation learning—this paper proposes G-GBM, an inductive heterogeneous dynamic graph learning model built upon gradient boosting machines (GBMs). G-GBM is the first to extend the GBM framework to inductive learning on heterogeneous dynamic graphs. It integrates graph-structured sampling with node-level feature engineering to encode topological information, thereby achieving both high predictive accuracy and model interpretability—enhanced via SHAP-based analysis. Evaluated on synthetic random graphs, open-source benchmarks, and real-world insurance datasets, G-GBM consistently outperforms state-of-the-art graph neural networks, significantly improving precision while maintaining high recall. These results demonstrate its effectiveness and practical utility for fraud detection in complex, evolving insurance networks.

Technology Category

Application Category

📝 Abstract

Graph-based methods are becoming increasingly popular in machine learning due to their ability to model complex data and relations. Insurance fraud is a prime use case, since false claims are often the result of organised criminals that stage accidents or the same persons filing erroneous claims on multiple policies. One challenge is that graph-based approaches struggle to find meaningful representations of the data because of the high class imbalance present in fraud data. Another is that insurance networks are heterogeneous and dynamic, given the changing relations among people, companies and policies. That is why gradient boosted tree approaches on tabular data still dominate the field. Therefore, we present a novel inductive graph gradient boosting machine (G-GBM) for supervised learning on heterogeneous and dynamic graphs. We show that our estimator competes with popular graph neural network approaches in an experiment using a variety of simulated random graphs. We demonstrate the power of G-GBM for insurance fraud detection using an open-source and a real-world, proprietary dataset. Given that the backbone model is a gradient boosting forest, we apply established explainability methods to gain better insights into the predictions made by G-GBM.

Problem

Research questions and friction points this paper is trying to address.

Detecting insurance fraud in heterogeneous dynamic graph networks

Addressing class imbalance challenges in graph-based fraud detection

Improving interpretability of graph machine learning for fraud prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inductive graph gradient boosting machine for graphs

Handles heterogeneous and dynamic graph data

Applies explainability methods to gradient boosting predictions

🔎 Similar Papers

No similar papers found.