Nonlocal Monte Carlo via Reinforcement Learning

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Hard combinatorial optimization benchmarks—such as random 4-SAT—exhibit a “clustering–gap” structure near the computational phase transition, causing conventional MCMC methods (e.g., simulated annealing, parallel tempering) to get trapped in local optima and fail to unfreeze rigid variables or sample high-quality, diverse solutions. Method: We propose the first deep reinforcement learning (DRL) framework for learning *non-local* Monte Carlo transition policies: it uses local energy-landscape geometry as state input and energy change as reward, enabling automatic discovery of dynamic, non-equilibrium temperature schedules. Contribution/Results: By breaking both locality constraints and thermodynamic equilibrium assumptions inherent in standard MCMC, our approach achieves significantly lower residual energy, faster convergence, and markedly improved solution diversity and distribution coverage on random 4-SAT—demonstrating the viability of DRL-driven non-equilibrium sampling for hard constraint satisfaction problems.

Technology Category

Application Category

📝 Abstract

Optimizing or sampling complex cost functions of combinatorial optimization problems is a longstanding challenge across disciplines and applications. When employing family of conventional algorithms based on Markov Chain Monte Carlo (MCMC) such as simulated annealing or parallel tempering, one assumes homogeneous (equilibrium) temperature profiles across input. This instance independent approach was shown to be ineffective for the hardest benchmarks near a computational phase transition when the so-called overlap-gap-property holds. In these regimes conventional MCMC struggles to unfreeze rigid variables, escape suboptimal basins of attraction, and sample high-quality and diverse solutions. In order to mitigate these challenges, Nonequilibrium Nonlocal Monte Carlo (NMC) algorithms were proposed that leverage inhomogeneous temperature profiles thereby accelerating exploration of the configuration space without compromising its exploitation. Here, we employ deep reinforcement learning (RL) to train the nonlocal transition policies of NMC which were previously designed phenomenologically. We demonstrate that the resulting solver can be trained solely by observing energy changes of the configuration space exploration as RL rewards and the local minimum energy landscape geometry as RL states. We further show that the trained policies improve upon the standard MCMC-based and nonlocal simulated annealing on hard uniform random and scale-free random 4-SAT benchmarks in terms of residual energy, time-to-solution, and diversity of solutions metrics.

Problem

Research questions and friction points this paper is trying to address.

Optimizing complex cost functions in combinatorial problems

Overcoming inefficiencies of conventional MCMC in hard benchmarks

Training nonlocal Monte Carlo policies via reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning optimizes nonlocal Monte Carlo

Inhomogeneous temperature profiles enhance exploration

Deep RL trains policies via energy changes

🔎 Similar Papers

A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach