Interpretable Representation Learning for Additive Rule Ensembles

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Traditional symbolic rule ensembles rely on axis-aligned threshold propositions, offering high interpretability but limited expressiveness: when features lack independence or discriminative power, maintaining accuracy necessitates numerous complex rules—thereby degrading interpretability. This paper introduces SparseRule, the first method to embed learnable sparse linear projections into rule conditions, generalizing univariate thresholds to sparse linear combination propositions. This yields oblique polyhedral decision regions, overcoming the axis-aligned constraint. SparseRule employs iterative reweighted logistic regression coupled with sequential greedy optimization to learn sparse-weighted rules one-by-one and integrate them additively. Evaluated on ten benchmark datasets, SparseRule achieves test risk competitive with state-of-the-art models while reducing the number of rules by 30–65%, markedly improving the accuracy–interpretability trade-off.

Technology Category

Application Category

📝 Abstract

Small additive ensembles of symbolic rules offer interpretable prediction models. Traditionally, these ensembles use rule conditions based on conjunctions of simple threshold propositions $x geq t$ on a single input variable $x$ and threshold $t$, resulting geometrically in axis-parallel polytopes as decision regions. While this form ensures a high degree of interpretability for individual rules and can be learned efficiently using the gradient boosting approach, it relies on having access to a curated set of expressive and ideally independent input features so that a small ensemble of axis-parallel regions can describe the target variable well. Absent such features, reaching sufficient accuracy requires increasing the number and complexity of individual rules, which diminishes the interpretability of the model. Here, we extend classical rule ensembles by introducing logical propositions with learnable sparse linear transformations of input variables, i.e., propositions of the form $mathbf{x}^mathrm{T}mathbf{w} geq t$, where $mathbf{w}$ is a learnable sparse weight vector, enabling decision regions as general polytopes with oblique faces. We propose a learning method using sequential greedy optimization based on an iteratively reweighted formulation of logistic regression. Experimental results demonstrate that the proposed method efficiently constructs rule ensembles with the same test risk as state-of-the-art methods while significantly reducing model complexity across ten benchmark datasets.

Problem

Research questions and friction points this paper is trying to address.

Improves interpretability of additive rule ensembles

Enhances accuracy without increasing rule complexity

Introduces learnable sparse linear transformations for better decision regions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learnable sparse linear transformations for rules

Sequential greedy optimization for training

Oblique decision regions enhancing interpretability

🔎 Similar Papers

A Unified Approach to Extract Interpretable Rules from Tree Ensembles via Integer Programming