Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
This work addresses the challenge of out-of-distribution extrapolation in offline black-box optimization, where static datasets limit generalization. The authors propose SPADE, a framework that reframes forward surrogate modeling as conditional generative modeling by leveraging diffusion models to estimate the likelihood $p(y|x)$. SPADE incorporates calibrated diffusion estimation and a support-proximity regularizer based on k-nearest-neighbor density to enforce data manifold constraints. Theoretically, this approach is equivalent to maximizing a Bayesian posterior that integrates an effective design prior, thereby balancing global statistical consistency with local geometric structure. By unifying strengths of both forward and inverse modeling paradigms, SPADE overcomes limitations inherent in conventional methods. Empirical evaluations on Design-Bench and LLM-based hybrid optimization benchmarks demonstrate that SPADE achieves state-of-the-art performance.
📝 Abstract
Offline black-box optimization aims to discover novel designs with high property scores using only a static dataset, a task fundamentally challenged by the out-of-distribution (OOD) extrapolation problem. Existing approaches typically bifurcate into inverse methods, which struggle with the ill-posed nature of mapping scores to designs, and forward methods, which often lack the distributional expressivity to quantify uncertainty effectively. In this work, we propose SPADE (Support-Proximity Augmented Diffusion Estimation), a novel framework that reimagines forward surrogate modeling through the lens of conditional generative modeling. SPADE models the forward likelihood p(y|x) using a diffusion model, but with two critical enhancements to tailor it for optimization: (1) a Calibrated Diffusion Estimation module that enforces global consistency in statistical moments and pairwise rankings, and (2) a Support-Proximity Regularization mechanism that implicitly internalizes the data manifold constraint p(x) via kNN-based density estimation. Theoretically, we prove that our regularization is first-order equivalent to maximizing a Bayesian posterior with a valid design prior. Empirically, SPADE achieves state-of-the-art performance across Design-Bench tasks and an LLM data mixture optimization benchmark.
Problem

Research questions and friction points this paper is trying to address.

offline black-box optimization
out-of-distribution extrapolation
static dataset
design discovery
property scores
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model
offline black-box optimization
support-proximity regularization
conditional generative modeling
Bayesian posterior