๐ค AI Summary
Existing methods for identifying influential data points in nonlinear models lack theoretical foundations. Method: This paper introduces the first theoretically grounded importance sampling framework for nonlinear models by generalizing norm- and leverage-score-based importance measures from linear modelsโachieved through the novel incorporation of the adjoint operator of the nonlinear mapping. Contribution/Results: The framework provides error-controlled approximation guarantees and enables efficient subspace embedding analysis. Extensive experiments across diverse supervised learning tasks demonstrate substantial improvements in sampling efficiency and training acceleration, enhanced model interpretability, and effective outlier detection. Both theoretical analysis and empirical evaluation confirm that the proposed method significantly reduces training overhead for large-scale models, thereby addressing a critical gap in the field of nonlinear importance sampling.
๐ Abstract
While norm-based and leverage-score-based methods have been extensively studied for identifying"important"data points in linear models, analogous tools for nonlinear models remain significantly underdeveloped. By introducing the concept of the adjoint operator of a nonlinear map, we address this gap and generalize norm-based and leverage-score-based importance sampling to nonlinear settings. We demonstrate that sampling based on these generalized notions of norm and leverage scores provides approximation guarantees for the underlying nonlinear mapping, similar to linear subspace embeddings. As direct applications, these nonlinear scores not only reduce the computational complexity of training nonlinear models by enabling efficient sampling over large datasets but also offer a novel mechanism for model explainability and outlier detection. Our contributions are supported by both theoretical analyses and experimental results across a variety of supervised learning scenarios.