🤖 AI Summary
Evaluating synthetic tabular data quality remains challenging due to conflicting, non-explainable evaluation metrics. To address this, we propose an explainable AI (XAI)-based diagnostic framework: first, a binary classifier distinguishes real from synthetic samples; then, permutation feature importance, partial dependence plots, SHAP values, and counterfactual explanations are systematically integrated to localize root causes of distributional discrepancies—such as anomalous variable dependencies or missingness pattern biases—revealing structural flaws in generation. This is the first work to systematically apply XAI techniques to synthetic data quality assessment. Experiments on two benchmark datasets demonstrate that our framework uncovers critical generative defects—e.g., spurious correlations and biased missingness—missed by conventional metrics like Jensen–Shannon divergence and machine learning utility. It delivers actionable, attribution-aware diagnostics, thereby enhancing transparency and accelerating iterative refinement of synthetic data generators.
📝 Abstract
Evaluating synthetic tabular data is challenging, since they can differ from the real data in so many ways. There exist numerous metrics of synthetic data quality, ranging from statistical distances to predictive performance, often providing conflicting results. Moreover, they fail to explain or pinpoint the specific weaknesses in the synthetic data. To address this, we apply explainable AI (XAI) techniques to a binary detection classifier trained to distinguish real from synthetic data. While the classifier identifies distributional differences, XAI concepts such as feature importance and feature effects, analyzed through methods like permutation feature importance, partial dependence plots, Shapley values and counterfactual explanations, reveal why synthetic data are distinguishable, highlighting inconsistencies, unrealistic dependencies, or missing patterns. This interpretability increases transparency in synthetic data evaluation and provides deeper insights beyond conventional metrics, helping diagnose and improve synthetic data quality. We apply our approach to two tabular datasets and generative models, showing that it uncovers issues overlooked by standard evaluation techniques.