🤖 AI Summary
To address the interpretability gap of neural networks in tabular data modeling—where their black-box nature hinders trust and regulatory compliance—this paper proposes a novel interpretable neural architecture that embeds Sparse Autoencoders (SAEs) into the backbone network. This integration enables semantically unambiguous decomposition of nonlinear features and explicit alignment with human-understandable concepts. The resulting model supports end-to-end interpretable prediction, with reasoning paths amenable to manual verification. Experiments demonstrate that our method matches the predictive accuracy of state-of-the-art black-box models (e.g., MLPs, XGBoost), while substantially outperforming conventional interpretable baselines (e.g., decision trees, linear models). The core contribution lies in the first deep structural coupling of SAEs as intrinsic interpretability modules within neural networks—thereby reconciling high representational capacity with concept-level transparency.
📝 Abstract
In data-driven applications relying on tabular data, where interpretability is key, machine learning models such as decision trees and linear regression are applied. Although neural networks can provide higher predictive performance, they are not used because of their blackbox nature. In this work, we present XNNTab, a neural architecture that combines the expressiveness of neural networks and interpretability. XNNTab first learns highly non-linear feature representations, which are decomposed into monosemantic features using a sparse autoencoder (SAE). These features are then assigned human-interpretable concepts, making the overall model prediction intrinsically interpretable. XNNTab outperforms interpretable predictive models, and achieves comparable performance to its non-interpretable counterparts.