🤖 AI Summary
This work proposes a novel feature selection method for high-dimensional biomedical data that effectively balances scalability, stability, interpretability, and nonlinear modeling capability—challenges often difficult to reconcile simultaneously. The approach uniquely integrates statistical priors from filter-based methods with a multi-head attention mechanism: prior knowledge initializes and guides deep network learning, while multi-head attention enables parallel modeling of complex nonlinear relationships among features. A subsequent reranking module fuses outputs from multiple perspectives to produce a stable and consistent feature ranking. Evaluated on real-world and simulated datasets—including cancer gene expression and Alzheimer’s disease—the method significantly improves feature coverage and selection stability, achieving a compelling synthesis of statistical interpretability and deep learning expressiveness.
📝 Abstract
Feature selection is essential for high-dimensional biomedical data, enabling stronger predictive performance, reduced computational cost, and improved interpretability in precision medicine applications. Existing approaches face notable challenges. Filter methods are highly scalable but cannot capture complex relationships or eliminate redundancy. Deep learning-based approaches can model nonlinear patterns but often lack stability, interpretability, and efficiency at scale. Single-head attention improves interpretability but is limited in capturing multi-level dependencies and remains sensitive to initialization, reducing reproducibility. Most existing methods rarely combine statistical interpretability with the representational power of deep learning, particularly in ultra-high-dimensional settings. Here, we introduce MAFS (Multi-head Attention-based Feature Selection), a hybrid framework that integrates statistical priors with deep learning capabilities. MAFS begins with filter-based priors for stable initialization and guide learning. It then uses multi-head attention to examine features from multiple perspectives in parallel, capturing complex nonlinear relationships and interactions. Finally, a reordering module consolidates outputs across attention heads, resolving conflicts and minimizing information loss to generate robust and consistent feature rankings. This design combines statistical guidance with deep modeling capacity, yielding interpretable importance scores while maximizing retention of informative signals. Across simulated and real-world datasets, including cancer gene expression and Alzheimer's disease data, MAFS consistently achieves superior coverage and stability compared with existing filter-based and deep learning-based alternatives, offering a scalable, interpretable, and robust solution for feature selection in high-dimensional biomedical data.