Interpretable Representation Learning for Additive Rule Ensembles

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional symbolic rule ensembles rely on axis-aligned threshold propositions, offering high interpretability but limited expressiveness: when features lack independence or discriminative power, maintaining accuracy necessitates numerous complex rules—thereby degrading interpretability. This paper introduces SparseRule, the first method to embed learnable sparse linear projections into rule conditions, generalizing univariate thresholds to sparse linear combination propositions. This yields oblique polyhedral decision regions, overcoming the axis-aligned constraint. SparseRule employs iterative reweighted logistic regression coupled with sequential greedy optimization to learn sparse-weighted rules one-by-one and integrate them additively. Evaluated on ten benchmark datasets, SparseRule achieves test risk competitive with state-of-the-art models while reducing the number of rules by 30–65%, markedly improving the accuracy–interpretability trade-off.

Technology Category

Application Category

📝 Abstract
Small additive ensembles of symbolic rules offer interpretable prediction models. Traditionally, these ensembles use rule conditions based on conjunctions of simple threshold propositions $x geq t$ on a single input variable $x$ and threshold $t$, resulting geometrically in axis-parallel polytopes as decision regions. While this form ensures a high degree of interpretability for individual rules and can be learned efficiently using the gradient boosting approach, it relies on having access to a curated set of expressive and ideally independent input features so that a small ensemble of axis-parallel regions can describe the target variable well. Absent such features, reaching sufficient accuracy requires increasing the number and complexity of individual rules, which diminishes the interpretability of the model. Here, we extend classical rule ensembles by introducing logical propositions with learnable sparse linear transformations of input variables, i.e., propositions of the form $mathbf{x}^mathrm{T}mathbf{w} geq t$, where $mathbf{w}$ is a learnable sparse weight vector, enabling decision regions as general polytopes with oblique faces. We propose a learning method using sequential greedy optimization based on an iteratively reweighted formulation of logistic regression. Experimental results demonstrate that the proposed method efficiently constructs rule ensembles with the same test risk as state-of-the-art methods while significantly reducing model complexity across ten benchmark datasets.
Problem

Research questions and friction points this paper is trying to address.

Improves interpretability of additive rule ensembles
Enhances accuracy without increasing rule complexity
Introduces learnable sparse linear transformations for better decision regions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learnable sparse linear transformations for rules
Sequential greedy optimization for training
Oblique decision regions enhancing interpretability
🔎 Similar Papers
No similar papers found.
S
Shahrzad Behzadimanesh
Department of Data Science and Artificial Intelligence, Monash University
Pierre Le Bodic
Pierre Le Bodic
Senior Lecturer in the faculty of IT, Monash University
Mixed Integer ProgrammingComplexityalgorithms and approximation
G
Geoffrey I. Webb
Department of Data Science and Artificial Intelligence, Monash University
Mario Boley
Mario Boley
University of Haifa, Monash University
Interpretable Machine LearningMaterials InformaticsBranch-and-Bound