🤖 AI Summary
This work addresses three key challenges in interactive model selection: stringent false discovery rate (FDR) control, limited data reuse, and insufficient human–machine collaboration. To this end, we propose Adaptive Conformal Selection (ACS), the first conformal selection framework tailored for interactive settings. ACS extends conformal inference by integrating information masking and dynamic update mechanisms, enabling rigorous FDR control during open-ended exploratory analysis while permitting partial data reuse to enhance statistical power. The framework supports real-time adaptation to user feedback, incorporation of evolving preferences, and assimilation of incremental labels—making it applicable to tasks such as model selection and diverse candidate screening. Empirical evaluation on real-world applications—including large language model deployment and drug discovery—demonstrates that ACS significantly improves selection efficiency and adaptability, while guaranteeing strict FDR control throughout the interaction process.
📝 Abstract
This paper presents adaptive conformal selection (ACS), an interactive framework for model-free selection with guaranteed error control. Building on conformal selection (Jin and Candès, 2023b), ACS generalizes the approach to support human-in-the-loop adaptive data analysis. Under the ACS framework, we can partially reuse the data to boost the selection power, make decisions on the fly while exploring the data, and incorporate new information or preferences as they arise. The key to ACS is a carefully designed principle that controls the information available for decision making, allowing the data analyst to explore the data adaptively while maintaining rigorous control of the false discovery rate (FDR). Based on the ACS framework, we provide concrete selection algorithms for various goals, including model update/selection, diversified selection, and incorporating newly available labeled data. The effectiveness of ACS is demonstrated through extensive numerical simulations and real-data applications in large language model (LLM) deployment and drug discovery.