🤖 AI Summary
This work addresses the discrete combinatorial design generation problem for rare, functionally targeted molecules—such as functional protein or nucleic acid sequences—under constraints of expensive black-box evaluations (e.g., wet-lab experiments or high-fidelity simulations) and batched query requirements. We propose the Variational Sequence Design (VSD) framework, which formally characterizes the theoretical requirements and convergence criteria for active generative design. VSD constructs a differentiable conditional generative model via variational inference, enabling end-to-end gradient-based optimization jointly with scalable surrogate predictors. Theoretically, VSD guarantees asymptotic convergence to high-functionality rare designs. Empirically, it achieves superior performance over state-of-the-art baselines across multiple real-world bioengineering tasks, efficiently and robustly discovering rare, high-performing sequence variants.
📝 Abstract
We develop VSD, a method for conditioning a generative model of discrete, combinatorial designs on a rare desired class by efficiently evaluating a black-box (e.g. experiment, simulation) in a batch sequential manner. We call this task active generation; we formalize active generation's requirements and desiderata, and formulate a solution via variational inference. VSD uses off-the-shelf gradient based optimization routines, can learn powerful generative models for desirable designs, and can take advantage of scalable predictive models. We derive asymptotic convergence rates for learning the true conditional generative distribution of designs with certain configurations of our method. After illustrating the generative model on images, we empirically demonstrate that VSD can outperform existing baseline methods on a set of real sequence-design problems in various protein and DNA/RNA engineering tasks.