🤖 AI Summary
Traditional risk scores rely on linear models, which struggle to capture nonlinear effects and often produce lengthy rule sets that compromise interpretability and practical utility. This work proposes a novel approach that, for the first time, integrates gradient boosting into risk score construction, achieving both strong nonlinear modeling capacity and a human-computable, concise structure amenable to classification, regression, and time-to-event tasks. Implemented in C++ with Python and R interfaces, the method is evaluated across 12 tabular datasets. Results demonstrate that it reduces the number of rules by an average of 60% in classification tasks and 16% in time-to-event tasks, while maintaining predictive performance on par with state-of-the-art methods.
📝 Abstract
Risk scores are an interpretable and actionable class of machine learning models with applications in medicine, insurance, and risk management. Unlike most computational methods, risk scores are designed to be computed by a human by attributing points to a data sample based on a limited set of criteria. The most common approaches for generating risk scores use linear regressions to estimate the effect of selected variables. We propose a simple and effective approach towards building compact and predictive risk scores. We provide an algorithm based on gradient boosting that is capable of modeling nonlinear effects, along with a C++ implementation with Python and R bindings. Through extensive empirical evaluation on twelve tabular datasets spanning regression, classification, and time-to-event tasks, we show that our method achieves competitive predictive performance while producing substantially more compact scores than regression-based alternatives, with 60% fewer rules for classification tasks and 16% fewer rules for time-to-event tasks on average, compared to AutoScore.