Sparse minimum Redundancy Maximum Relevance for feature selection

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of identifying irrelevant features and controlling the false discovery rate (FDR) in high-dimensional data, this paper proposes a sparse minimum redundancy maximum relevance (mRMR) feature selection method. The method innovatively integrates the mRMR framework with rigorous FDR control: it models joint feature–feature and feature–target dependencies via a nonconvex regularized formulation based on the Hilbert–Schmidt Independence Criterion (HSIC) kernel measure, and introduces a multi-stage knockoff filtering mechanism to conservatively regulate FDR—without requiring prespecification of the number of selected features, only an FDR threshold. Extensive experiments on synthetic and real-world datasets demonstrate that the proposed method significantly outperforms HSIC-LASSO, achieving lower redundancy in selected features, stable and controllable FDR, and strong theoretical guarantees. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
We propose a feature screening method that integrates both feature-feature and feature-target relationships. Inactive features are identified via a penalized minimum Redundancy Maximum Relevance (mRMR) procedure, which is the continuous version of the classic mRMR penalized by a non-convex regularizer, and where the parameters estimated as zero coefficients represent the set of inactive features. We establish the conditions under which zero coefficients are correctly identified to guarantee accurate recovery of inactive features. We introduce a multi-stage procedure based on the knockoff filter enabling the penalized mRMR to discard inactive features while controlling the false discovery rate (FDR). Our method performs comparably to HSIC-LASSO but is more conservative in the number of selected features. It only requires setting an FDR threshold, rather than specifying the number of features to retain. The effectiveness of the method is illustrated through simulations and real-world datasets. The code to reproduce this work is available on the following GitHub: https://github.com/PeterJackNaylor/SmRMR.
Problem

Research questions and friction points this paper is trying to address.

Proposes feature selection method integrating feature-feature and feature-target relationships
Identifies inactive features through penalized minimum Redundancy Maximum Relevance procedure
Controls false discovery rate while discarding inactive features using knockoff filter
Innovation

Methods, ideas, or system contributions that make the work stand out.

Penalized mRMR with non-convex regularization
Multi-stage knockoff filter for FDR control
Automatic feature selection via zero coefficients
🔎 Similar Papers
No similar papers found.
P
Peter Naylor
High-Dimensional Statistical Modeling Team, RIKEN AIP, Kyoto, 606-8501, Japan
B
Benjamin Poignard
High-Dimensional Statistical Modeling Team, RIKEN AIP, Kyoto, 606-8501, Japan; Keio University, Faculty of Science and Technology, Department of Mathematics, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 2238522, Japan
Héctor Climente-González
Héctor Climente-González
Novo Nordisk Research Centre Oxford
machine learninggwasepistasisfeature selectionnetworks
Makoto Yamada
Makoto Yamada
OIST & FlatMinima Inc.
Machine LearningOptimal transportRepresentation Learning