Learning Energy-Based Models by Self-normalising the Likelihood

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Maximum-likelihood training of energy-based models (EBMs) is hindered by the intractable partition function, which necessitates costly MCMC sampling. To address this, we propose the self-normalized likelihood (SNL) objective: a differentiable, optimizable lower bound on the log-likelihood that replaces MCMC with a single learnable scalar parameter. Crucially, the global optimum of SNL simultaneously yields both the optimal model parameters and the exact normalization constant. For exponential-family EBMs, SNL guarantees parameter concavity. Our method enables end-to-end training via stochastic gradient optimization combined with sampling from a coarse proposal distribution. Experiments demonstrate that SNL significantly outperforms existing approaches on density estimation and regression-type EBM tasks. It offers notable advantages—including implementation simplicity, robustness to hyperparameters, and superior empirical performance—while maintaining theoretical soundness and computational efficiency.

Technology Category

Application Category

📝 Abstract

Training an energy-based model (EBM) with maximum likelihood is challenging due to the intractable normalisation constant. Traditional methods rely on expensive Markov chain Monte Carlo (MCMC) sampling to estimate the gradient of logartihm of the normalisation constant. We propose a novel objective called self-normalised log-likelihood (SNL) that introduces a single additional learnable parameter representing the normalisation constant compared to the regular log-likelihood. SNL is a lower bound of the log-likelihood, and its optimum corresponds to both the maximum likelihood estimate of the model parameters and the normalisation constant. We show that the SNL objective is concave in the model parameters for exponential family distributions. Unlike the regular log-likelihood, the SNL can be directly optimised using stochastic gradient techniques by sampling from a crude proposal distribution. We validate the effectiveness of our proposed method on various density estimation tasks as well as EBMs for regression. Our results show that the proposed method, while simpler to implement and tune, outperforms existing techniques.

Problem

Research questions and friction points this paper is trying to address.

Challenges in training energy-based models with maximum likelihood.

Intractable normalization constant complicates traditional training methods.

Proposes self-normalized log-likelihood to simplify and improve training.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-normalised log-likelihood (SNL) objective introduced

SNL optimised using stochastic gradient techniques

Single learnable parameter for normalisation constant

🔎 Similar Papers

Unifying Self-Supervised Clustering and Energy-Based Models