π€ AI Summary
To address feature learning bias caused by spurious correlations in neural networks, this paper proposes difFOCIβthe first differentiable, parametric conditional dependence measure, extending the nonparametric variable selection method FOCI into an end-to-end trainable objective. Its core innovation lies in constructing a differentiable approximation of FOCI based on rank statistics, enabling joint optimization of feature selection, model fitting, and regularization. Moreover, difFOCI supports fairness-aware classification by enforcing independence between predictions and sensitive attributes. Experiments on synthetic benchmarks, CNN saliency analysis, and fair classification tasks demonstrate that difFOCI significantly outperforms FOCI and other baselines, yielding consistent improvements in feature discriminability, robustness to distribution shifts, and fairness compliance.
π Abstract
In this paper, we leverage existing statistical methods to better understand feature learning from data. We tackle this by modifying the model-free variable selection method, Feature Ordering by Conditional Independence (FOCI), which is introduced in cite{azadkia2021simple}. While FOCI is based on a non-parametric coefficient of conditional dependence, we introduce its parametric, differentiable approximation. With this approximate coefficient of correlation, we present a new algorithm called difFOCI, which is applicable to a wider range of machine learning problems thanks to its differentiable nature and learnable parameters. We present difFOCI in three contexts: (1) as a variable selection method with baseline comparisons to FOCI, (2) as a trainable model parametrized with a neural network, and (3) as a generic, widely applicable neural network regularizer, one that improves feature learning with better management of spurious correlations. We evaluate difFOCI on increasingly complex problems ranging from basic variable selection in toy examples to saliency map comparisons in convolutional networks. We then show how difFOCI can be incorporated in the context of fairness to facilitate classifications without relying on sensitive data.