🤖 AI Summary
This work investigates how variable encoding—Ising ({−1, +1}) versus QUBO ({0, 1})—affects learning performance, information-geometric structure, and finite-time dynamics of Boltzmann machines. Through theoretical analysis and empirical experiments, we show that QUBO encoding introduces strong cross-terms, rendering the Fisher information matrix ill-conditioned and severely degrading stochastic gradient descent (SGD) convergence; in contrast, Ising encoding enables faster SGD convergence. We further prove that natural gradient descent eliminates encoding dependence, yielding geometrically invariant learning dynamics. Leveraging the intrinsic relationship between the Fisher information matrix and the covariance of sufficient statistics, we derive principled criteria for encoding selection and preprocessing: QUBO-encoded models require centering or preconditioning to match Ising performance. This is the first systematic study to reveal how discrete variable representations fundamentally shape information geometry and optimization trajectories, providing both theoretical foundations and practical guidelines for encoding design in probabilistic models.
📝 Abstract
We compare Ising ({-1,+1}) and QUBO ({0,1}) encodings for Boltzmann machine learning under a controlled protocol that fixes the model, sampler, and step size. Exploiting the identity that the Fisher information matrix (FIM) equals the covariance of sufficient statistics, we visualize empirical moments from model samples and reveal systematic, representation-dependent differences. QUBO induces larger cross terms between first- and second-order statistics, creating more small-eigenvalue directions in the FIM and lowering spectral entropy. This ill-conditioning explains slower convergence under stochastic gradient descent (SGD). In contrast, natural gradient descent (NGD)-which rescales updates by the FIM metric-achieves similar convergence across encodings due to reparameterization invariance. Practically, for SGD-based training, the Ising encoding provides more isotropic curvature and faster convergence; for QUBO, centering/scaling or NGD-style preconditioning mitigates curvature pathologies. These results clarify how representation shapes information geometry and finite-time learning dynamics in Boltzmann machines and yield actionable guidelines for variable encoding and preprocessing.