🤖 AI Summary
This work addresses the challenge of reliably generating out-of-distribution data that adheres to new specifications under structural assumptions about the underlying data-generating mechanism. The authors propose a structure-guided extrapolation generation framework that, for the first time, establishes theoretical conditions for approximate identifiability of the target distribution and ensures the reliability of the generated distribution under conservative assumptions. The approach integrates two complementary algorithms—structure-aware optimization and diffusion posterior sampling—to effectively leverage structural priors during generation. Empirical evaluations on both synthetic and real-world image extrapolation tasks demonstrate the framework’s superior performance in terms of generation quality and consistency with the desired extrapolation behavior.
📝 Abstract
This paper proposes a framework for Structural Extrapolated Data GEneration (SEDGE) based on suitable assumptions on the underlying data generating process. We provide conditions under which data satisfying new specifications can be generated reliably, together with the approximate identifiability of the distribution of such data under certain ``conservative" assumptions. On the algorithmic side, we develop practical methods to achieve extrapolated data generation, based on the structure-informed optimization strategy or diffusion posterior sampling, respectively. We verify the extrapolation performance on synthetic data and also consider extrapolated image generation as a real-world scenario to illustrate the validity of the proposed framework.