Sampling-guided exploration of active feature selection policies

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the trade-off between predictive performance and feature acquisition cost by proposing a dynamic, sample-level feature selection method. It formulates active feature selection as a Markov decision process with variable state dimensions and employs reinforcement learning to sequentially recommend the most cost-effective next feature. To enhance scalability and policy simplicity, the approach innovatively integrates a heuristic feature combination exploration strategy tailored for large-scale data and a post-fitting regularization mechanism, effectively compressing decision paths. Experimental results on four binary classification datasets—ranging up to 56 features and 4,500 samples—demonstrate that the proposed method achieves higher accuracy while significantly outperforming existing approaches in terms of decision efficiency and model compactness.

Technology Category

Application Category

📝 Abstract
Determining the most appropriate features for machine learning predictive models is challenging regarding performance and feature acquisition costs. In particular, global feature choice is limited given that some features will only benefit a subset of instances. In previous work, we proposed a reinforcement learning approach to sequentially recommend which modality to acquire next to reach the best information/cost ratio, based on the instance-specific information already acquired. We formulated the problem as a Markov Decision Process where the state's dimensionality changes during the episode, avoiding data imputation, contrary to existing works. However, this only allowed processing a small number of features, as all possible combinations of features were considered. Here, we address these limitations with two contributions: 1) we expand our framework to larger datasets with a heuristic-based strategy that focuses on the most promising feature combinations, and 2) we introduce a post-fit regularisation strategy that reduces the number of different feature combinations, leading to compact sequences of decisions. We tested our method on four binary classification datasets (one involving high-dimensional variables), the largest of which had 56 features and 4500 samples. We obtained better performance than state-of-the-art methods, both in terms of accuracy and policy complexity.
Problem

Research questions and friction points this paper is trying to address.

active feature selection
feature acquisition cost
instance-specific feature selection
sequential decision making
high-dimensional data
Innovation

Methods, ideas, or system contributions that make the work stand out.

active feature selection
reinforcement learning
Markov Decision Process
heuristic-based strategy
post-fit regularization
🔎 Similar Papers
No similar papers found.
G
Gabriel Bernardino
Department of Engineering, Universitat Pompeu Fabra, Barcelona, Spain
A
Anders Jonsson
Department of Engineering, Universitat Pompeu Fabra, Barcelona, Spain
P
Patrick Clarysse
Univ Lyon, Université Claude Bernard Lyon 1, INSA-Lyon, CNRS, Inserm, CREATIS UMR 5220, U1294, F-69621, Lyon, France
Nicolas Duchateau
Nicolas Duchateau
Associate Professor / CREATIS lab - Université Lyon 1, France
Medical image analysisComputational anatomyCardiac imaging