🤖 AI Summary
This work addresses the challenge of quantifying prediction uncertainty in generative biomolecular design, where feedback covariate shift undermines conventional uncertainty estimation. We propose the first conformal prediction framework tailored to closed-loop design paradigms. Departing from standard i.i.d. assumptions, our method imposes no structural constraints on either the design algorithm or the regression model, delivering finite-sample statistically valid confidence sets for arbitrary black-box design pipelines. Key innovations include quantile-regression-driven adaptive conformal prediction, explicit modeling of feedback-induced distributional shift, and robust error calibration. Evaluated on protein and small-molecule design tasks, our approach achieves ≥94.8% empirical coverage at the 95% nominal confidence level—substantially outperforming standard conformal methods (which drop to as low as 72%)—while maintaining high predictive accuracy.
📝 Abstract
Significance An increasingly high-impact application of machine learning in scientific discovery is its use in the design of novel objects with desired properties, such as the design of proteins, small molecules, and materials. Although a variety of algorithms have been developed for this purpose, it remains unclear when practitioners can trust the predictions made by learned models for designed objects, since design algorithms induce a distinctive shift between the training and test data distributions. We propose a method that provides confidence sets for designed objects, which we show satisfy finite-sample guarantees of statistical validity, for any design algorithm involving any learned regression model. Our work enables more trustworthy use of machine learning for design.