🤖 AI Summary
Addressing the challenges of severe class imbalance, network heterogeneity, and dynamic evolution in insurance fraud detection—factors that impede effective graph representation learning—this paper proposes G-GBM, an inductive heterogeneous dynamic graph learning model built upon gradient boosting machines (GBMs). G-GBM is the first to extend the GBM framework to inductive learning on heterogeneous dynamic graphs. It integrates graph-structured sampling with node-level feature engineering to encode topological information, thereby achieving both high predictive accuracy and model interpretability—enhanced via SHAP-based analysis. Evaluated on synthetic random graphs, open-source benchmarks, and real-world insurance datasets, G-GBM consistently outperforms state-of-the-art graph neural networks, significantly improving precision while maintaining high recall. These results demonstrate its effectiveness and practical utility for fraud detection in complex, evolving insurance networks.
📝 Abstract
Graph-based methods are becoming increasingly popular in machine learning due to their ability to model complex data and relations. Insurance fraud is a prime use case, since false claims are often the result of organised criminals that stage accidents or the same persons filing erroneous claims on multiple policies. One challenge is that graph-based approaches struggle to find meaningful representations of the data because of the high class imbalance present in fraud data. Another is that insurance networks are heterogeneous and dynamic, given the changing relations among people, companies and policies. That is why gradient boosted tree approaches on tabular data still dominate the field. Therefore, we present a novel inductive graph gradient boosting machine (G-GBM) for supervised learning on heterogeneous and dynamic graphs. We show that our estimator competes with popular graph neural network approaches in an experiment using a variety of simulated random graphs. We demonstrate the power of G-GBM for insurance fraud detection using an open-source and a real-world, proprietary dataset. Given that the backbone model is a gradient boosting forest, we apply established explainability methods to gain better insights into the predictions made by G-GBM.