Explainable Neural Networks with Guarantees: A Sparse Estimation Approach

📅 2025-01-02

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Balancing predictive accuracy and interpretability remains a fundamental challenge in high-dimensional complex models. This paper proposes SparXnet, an interpretable neural network that learns sparse, univariate nonlinear transformation functions and combines them linearly for prediction—achieving black-box-level accuracy while enabling direct, feature-level interpretability. Its key contribution is the first unification of sparse feature selection with theoretically guaranteed interpretability: its generalization bound depends linearly on the number of selected features and only logarithmically on the total dimensionality, entirely decoupling sample complexity from model parameter count and architectural depth. Theoretical analysis and empirical evaluation demonstrate that SparXnet accurately identifies salient features on both synthetic and real-world datasets, matches state-of-the-art black-box models in predictive performance, and provides strong generalization guarantees with low sample complexity.

Technology Category

Application Category

📝 Abstract

Balancing predictive power and interpretability has long been a challenging research area, particularly in powerful yet complex models like neural networks, where nonlinearity obstructs direct interpretation. This paper introduces a novel approach to constructing an explainable neural network that harmonizes predictiveness and explainability. Our model, termed SparXnet, is designed as a linear combination of a sparse set of jointly learned features, each derived from a different trainable function applied to a single 1-dimensional input feature. Leveraging the ability to learn arbitrarily complex relationships, our neural network architecture enables automatic selection of a sparse set of important features, with the final prediction being a linear combination of rescaled versions of these features. We demonstrate the ability to select significant features while maintaining comparable predictive performance and direct interpretability through extensive experiments on synthetic and real-world datasets. We also provide theoretical analysis on the generalization bounds of our framework, which is favorably linear in the number of selected features and only logarithmic in the number of input features. We further lift any dependence of sample complexity on the number of parameters or the architectural details under very mild conditions. Our research paves the way for further research on sparse and explainable neural networks with guarantee.

Problem

Research questions and friction points this paper is trying to address.

Complex Models

Neural Networks

Interpretable Predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

SparXnet

Feature Selection

Interpretable Neural Networks

🔎 Similar Papers

Uncertainty Quantification for Gradient-based Explanations in Neural Networks