Towards Interpretable Deep Neural Networks for Tabular Data

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited interpretability of deep neural networks (DNNs) on tabular data—which hinders their deployment in high-stakes domains such as finance and healthcare—this paper proposes XNNTab, a novel architecture. XNNTab innovatively integrates a sparse autoencoder with latent-space semantic mapping to learn a univocal, semantically annotatable feature dictionary, enabling linear decomposition and full transparency of the prediction process. Key technical contributions include latent-feature disentanglement, automated semantic annotation, and a linearly interpretable prediction structure. Evaluated on multiple standard tabular benchmarks, XNNTab matches or surpasses the predictive performance of state-of-the-art black-box models (e.g., TabNet, MLP) and traditional methods (e.g., XGBoost), while providing complete, human-understandable, layer-wise explanations. This work represents the first substantive breakthrough toward high-performance, fully transparent, and inherently interpretable DNNs for tabular data.

Technology Category

Application Category

📝 Abstract
Tabular data is the foundation of many applications in fields such as finance and healthcare. Although DNNs tailored for tabular data achieve competitive predictive performance, they are blackboxes with little interpretability. We introduce XNNTab, a neural architecture that uses a sparse autoencoder (SAE) to learn a dictionary of monosemantic features within the latent space used for prediction. Using an automated method, we assign human-interpretable semantics to these features. This allows us to represent predictions as linear combinations of semantically meaningful components. Empirical evaluations demonstrate that XNNTab attains performance on par with or exceeding that of state-of-the-art, black-box neural models and classical machine learning approaches while being fully interpretable.
Problem

Research questions and friction points this paper is trying to address.

Interpretable deep neural networks for tabular data
Overcoming black-box nature of DNN predictions
Assigning human-interpretable semantics to learned features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse autoencoder learns monosemantic feature dictionary
Automated method assigns human-interpretable semantics to features
Predictions become linear combinations of interpretable components
🔎 Similar Papers
No similar papers found.