Conformal prediction under feedback covariate shift for biomolecular design

📅 2022-02-08

🏛️ Proceedings of the National Academy of Sciences of the United States of America

📈 Citations: 52

✨ Influential: 4

career value

198K/year

🤖 AI Summary

This work addresses the challenge of quantifying prediction uncertainty in generative biomolecular design, where feedback covariate shift undermines conventional uncertainty estimation. We propose the first conformal prediction framework tailored to closed-loop design paradigms. Departing from standard i.i.d. assumptions, our method imposes no structural constraints on either the design algorithm or the regression model, delivering finite-sample statistically valid confidence sets for arbitrary black-box design pipelines. Key innovations include quantile-regression-driven adaptive conformal prediction, explicit modeling of feedback-induced distributional shift, and robust error calibration. Evaluated on protein and small-molecule design tasks, our approach achieves ≥94.8% empirical coverage at the 95% nominal confidence level—substantially outperforming standard conformal methods (which drop to as low as 72%)—while maintaining high predictive accuracy.

📝 Abstract

Significance An increasingly high-impact application of machine learning in scientific discovery is its use in the design of novel objects with desired properties, such as the design of proteins, small molecules, and materials. Although a variety of algorithms have been developed for this purpose, it remains unclear when practitioners can trust the predictions made by learned models for designed objects, since design algorithms induce a distinctive shift between the training and test data distributions. We propose a method that provides confidence sets for designed objects, which we show satisfy finite-sample guarantees of statistical validity, for any design algorithm involving any learned regression model. Our work enables more trustworthy use of machine learning for design.

Problem

Research questions and friction points this paper is trying to address.

Quantify uncertainty in protein fitness predictions

Address distribution shift in training-test data dependence

Construct confidence sets for model predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal prediction for biomolecular design uncertainty

Confidence sets with finite-sample guarantees

Handles feedback-induced covariate shift

🔎 Similar Papers

FABind+: Enhancing Molecular Docking through Improved Pocket Prediction and Pose Generation