Quantum-Inspired Episode Selection for Monte Carlo Reinforcement Learning via QUBO Optimization

📅 2026-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of low sample efficiency in Monte Carlo reinforcement learning under sparse rewards, large state spaces, and strong trajectory correlations. It proposes a novel approach that formulates trajectory selection as a Quadratic Unconstrained Binary Optimization (QUBO) problem, which is solved using quantum-inspired samplers—specifically simulated quantum annealing and simulated bifurcation—to jointly optimize for both high cumulative return and diverse state coverage. Experimental results on finite-horizon GridWorld tasks demonstrate that the proposed MC+QUBO method significantly accelerates policy convergence and improves final performance, thereby validating the effectiveness of the QUBO framework combined with quantum-inspired algorithms in enhancing the efficiency of Monte Carlo evaluation.

Technology Category

Application Category

📝 Abstract
Monte Carlo (MC) reinforcement learning suffers from high sample complexity, especially in environments with sparse rewards, large state spaces, and correlated trajectories. We address these limitations by reformulating episode selection as a Quadratic Unconstrained Binary Optimization (QUBO) problem and solving it with quantum-inspired samplers. Our method, MC+QUBO, integrates a combinatorial filtering step into standard MC policy evaluation: from each batch of trajectories, we select a subset that maximizes cumulative reward while promoting state-space coverage. This selection is encoded as a QUBO, where linear terms favor high-reward episodes and quadratic terms penalize redundancy. We explore both Simulated Quantum Annealing (SQA) and Simulated Bifurcation (SB) as black-box solvers within this framework. Experiments in a finite-horizon GridWorld demonstrate that MC+QUBO outperforms vanilla MC in convergence speed and final policy quality, highlighting the potential of quantum-inspired optimization as a decision-making subroutine in reinforcement learning.
Problem

Research questions and friction points this paper is trying to address.

Monte Carlo reinforcement learning
sample complexity
sparse rewards
large state spaces
correlated trajectories
Innovation

Methods, ideas, or system contributions that make the work stand out.

QUBO
quantum-inspired optimization
Monte Carlo reinforcement learning
episode selection
state-space coverage
🔎 Similar Papers
2023-11-09International Conference on Agents and Artificial IntelligenceCitations: 3
H
Hadi Salloum
Phystech School of Applied Mathematics and Computer Science, MIPT, Russia; Research Center for Artificial Intelligence, Innopolis University, Russia; Q Deep, Innopolis, Russia
A
Ali Jnadi
Phystech School of Applied Mathematics and Computer Science, MIPT, Russia; Research Center for Artificial Intelligence, Innopolis University, Russia; Q Deep, Innopolis, Russia
Yaroslav Kholodov
Yaroslav Kholodov
Full professor of Innopolis University
Data analysisIntelligent transportation systemsNumerical methodsApplied mathematics
Alexander Gasnikov
Alexander Gasnikov
Innopolis University
convex optimizationAI