🤖 AI Summary
Accurately identifying and ranking the top-k most important features remains a fundamental challenge in interpretable machine learning; existing methods typically rely on post-hoc transformations and lack direct optimization for top-k ranking or theoretical guarantees. This paper introduces RAMPART, the first framework explicitly designed for top-k feature ranking. It employs adaptive sequential halving coupled with recursive pruning to directly optimize ranking objectives. To enhance stability and computational efficiency, RAMPART integrates observational and feature-subset sampling via a novel MiniPatches ensemble mechanism. Theoretical analysis establishes high-probability correctness of its top-k selections. Extensive experiments on diverse synthetic benchmarks and high-dimensional genomics tasks demonstrate that RAMPART consistently outperforms state-of-the-art feature importance methods, achieving superior accuracy and robustness in top-k feature identification.
📝 Abstract
Accurate ranking of important features is a fundamental challenge in interpretable machine learning with critical applications in scientific discovery and decision-making. Unlike feature selection and feature importance, the specific problem of ranking important features has received considerably less attention. We introduce RAMPART (Ranked Attributions with MiniPatches And Recursive Trimming), a framework that utilizes any existing feature importance measure in a novel algorithm specifically tailored for ranking the top-$k$ features. Our approach combines an adaptive sequential halving strategy that progressively focuses computational resources on promising features with an efficient ensembling technique using both observation and feature subsampling. Unlike existing methods that convert importance scores to ranks as post-processing, our framework explicitly optimizes for ranking accuracy. We provide theoretical guarantees showing that RAMPART achieves the correct top-$k$ ranking with high probability under mild conditions, and demonstrate through extensive simulation studies that RAMPART consistently outperforms popular feature importance methods, concluding with a high-dimensional genomics case study.