ConfHit: Conformal Generative Design with Oracle Free Guarantees

📅 2026-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of providing statistical validity guarantees in scientific generative tasks—such as drug discovery—under practical constraints including the absence of an experimental oracle, limited evaluation budgets, and distributional shift. The authors propose ConfHit, a novel framework that, for the first time, achieves distribution-free statistical validity under these conditions. ConfHit leverages weighted exchangeability and conformal p-values derived from multi-sample density ratio weighting, integrated within a nested hypothesis testing procedure to certify and refine generated candidate sets. Empirical evaluations across diverse molecular generation tasks demonstrate that ConfHit consistently attains nominal coverage across varying confidence levels while producing compact and reliable certified sets.

Technology Category

Application Category

📝 Abstract
The success of deep generative models in scientific discovery requires not only the ability to generate novel candidates but also reliable guarantees that these candidates indeed satisfy desired properties. Recent conformal-prediction methods offer a path to such guarantees, but its application to generative modeling in drug discovery is limited by budget constraints, lack of oracle access, and distribution shift. To this end, we introduce ConfHit, a distribution-free framework that provides validity guarantees under these conditions. ConfHit formalizes two central questions: (i) Certification: whether a generated batch can be guaranteed to contain at least one hit with a user-specified confidence level, and (ii) Design: whether the generation can be refined to a compact set without weakening this guarantee. ConfHit leverages weighted exchangeability between historical and generated samples to eliminate the need for an experimental oracle, constructs multiple-sample density-ratio weighted conformal p-value to quantify statistical confidence in hits, and proposes a nested testing procedure to certify and refine candidate sets of multiple generated samples while maintaining statistical guarantees. Across representative generative molecule design tasks and a broad range of methods, ConfHit consistently delivers valid coverage guarantees at multiple confidence levels while maintaining compact certified sets, establishing a principled and reliable framework for generative modeling.
Problem

Research questions and friction points this paper is trying to address.

conformal prediction
generative modeling
drug discovery
validity guarantees
distribution shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

conformal prediction
generative design
oracle-free
distribution-free
statistical guarantee