🤖 AI Summary
This work addresses the excessive conservativeness of existing methods for variable selection in high-dimensional settings, which often leads to an actual false discovery proportion (FDP) far below the target false discovery rate (FDR), thereby sacrificing statistical power. To overcome this limitation, the authors propose a learning-augmented T-Rex Selector framework that introduces, for the first time, a neural network–based FDP estimator. Trained on diverse synthetic data, this estimator replaces traditional analytical approaches to achieve a more accurate approximation of the FDP. The method maintains rigorous approximate FDR control while substantially enhancing variable discovery power. Extensive simulations and synthetic genome-wide association study (GWAS) experiments demonstrate that the proposed approach adheres more closely to the target FDR and significantly improves the detection rate of true signals compared to existing methods.
📝 Abstract
Controlling the false discovery rate (FDR) in high-dimensional variable selection requires balancing rigorous error control with statistical power. Existing methods with provable guarantees are often overly conservative, creating a persistent gap between the realized false discovery proportion (FDP) and the target FDR level. We introduce a learning-augmented enhancement of the T-Rex Selector framework that narrows this gap. Our approach replaces the analytical FDP estimator with a neural network trained solely on diverse synthetic datasets, enabling a substantially tighter and more accurate approximation of the FDP. This refinement allows the procedure to operate much closer to the desired FDR level, thereby increasing discovery power while maintaining effective approximate control. Through extensive simulations and a challenging synthetic genome-wide association study (GWAS), we demonstrate that our method achieves superior detection of true variables compared to existing approaches.