🤖 AI Summary
This work addresses generative modeling in the data-free setting: given only an energy function, it enables modeling of arbitrary continuous-time Markov processes—including diffusion, flow, and jump processes—and unifies generation of continuous, discrete, and hybrid-modal data. Methodologically, we generalize the generator-matching framework to generic continuous-time Markov processes for the first time, introducing self-normalized importance sampling and guided resampling to substantially reduce estimator variance. The target distribution is implicitly defined by the energy function, requiring neither observed data nor explicit normalization constants. Experiments demonstrate effectiveness on discrete tasks up to 100 dimensions and hybrid-modal tasks up to 20 dimensions. To our knowledge, this establishes the first general-purpose, data-free, energy-based generative framework supporting multimodal data generation.
📝 Abstract
We propose Energy-based generator matching (EGM), a modality-agnostic approach to train generative models from energy functions in the absence of data. Extending the recently proposed generator matching, EGM enables training of arbitrary continuous-time Markov processes, e.g., diffusion, flow, and jump, and can generate data from continuous, discrete, and a mixture of two modalities. To this end, we propose estimating the generator matching loss using self-normalized importance sampling with an additional bootstrapping trick to reduce variance in the importance weight. We validate EGM on both discrete and multimodal tasks up to 100 and 20 dimensions, respectively.