An Explainable Gaussian Process Auto-encoder for Tabular Data

📅 2025-08-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited interpretability of black-box models for tabular data by proposing a lightweight Gaussian process (GP)-based autoencoder framework for high-quality counterfactual generation. Methodologically, it embeds a GP into the autoencoder’s latent space to jointly model the underlying data manifold and predictive uncertainty; introduces a novel density-aware loss function; and incorporates an automatic regularization strength selection mechanism to ensure counterfactuals reside within the true data distribution while maintaining sufficient diversity. Experiments on multiple large-scale tabular benchmarks demonstrate substantial improvements over state-of-the-art baselines: +12.6% in distributional fidelity, +19.3% in diversity metrics, and ~40% reduction in parameter count—effectively mitigating overfitting. The core contributions lie in (i) GP-driven latent-space modeling that captures both geometry and uncertainty of tabular data manifolds, and (ii) an adaptive regularization strategy balancing faithfulness and diversity of counterfactuals.

Technology Category

Application Category

📝 Abstract
Explainable machine learning has attracted much interest in the community where the stakes are high. Counterfactual explanations methods have become an important tool in explaining a black-box model. The recent advances have leveraged the power of generative models such as an autoencoder. In this paper, we propose a novel method using a Gaussian process to construct the auto-encoder architecture for generating counterfactual samples. The resulting model requires fewer learnable parameters and thus is less prone to overfitting. We also introduce a novel density estimator that allows for searching for in-distribution samples. Furthermore, we introduce an algorithm for selecting the optimal regularization rate on density estimator while searching for counterfactuals. We experiment with our method in several large-scale tabular datasets and compare with other auto-encoder-based methods. The results show that our method is capable of generating diversified and in-distribution counterfactual samples.
Problem

Research questions and friction points this paper is trying to address.

Generating counterfactual explanations for black-box models
Reducing overfitting in auto-encoder architectures
Ensuring counterfactual samples remain in-distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian process auto-encoder architecture
Novel density estimator for in-distribution samples
Optimal regularization rate selection algorithm
🔎 Similar Papers
No similar papers found.