🤖 AI Summary
This paper addresses residual nonlinear dependencies and redundancy in machine learning feature representations. To this end, we propose the first differentiable and scalable Adversarial Dependency Minimization (ADM) framework. Unlike conventional linear decorrelation methods, ADM employs a bilevel adversarial architecture comprising an encoder and a lightweight discriminator, jointly minimizing higher-order statistical dependencies via gradient backpropagation and iterative optimization. We provide theoretical convergence guarantees for the framework. Empirically, ADM achieves significant improvements over strong baselines on three distinct tasks: nonlinear PCA, generalization enhancement in image classification, and preventing dimensional collapse in self-supervised representation learning. These results demonstrate ADM’s effectiveness in modeling complex dependencies and its strong generalization capability across diverse downstream applications.
📝 Abstract
Many machine learning techniques rely on minimizing the covariance between output feature dimensions to extract minimally redundant representations from data. However, these methods do not eliminate all dependencies/redundancies, as linearly uncorrelated variables can still exhibit nonlinear relationships. This work provides a differentiable and scalable algorithm for dependence minimization that goes beyond linear pairwise decorrelation. Our method employs an adversarial game where small networks identify dependencies among feature dimensions, while the encoder exploits this information to reduce dependencies. We provide empirical evidence of the algorithm's convergence and demonstrate its utility in three applications: extending PCA to nonlinear decorrelation, improving the generalization of image classification methods, and preventing dimensional collapse in self-supervised representation learning.