Quantum-Inspired Reinforcement Learning in the Presence of Epistemic Ambivalence

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This paper addresses epistemic ambivalence (EA)—a novel challenge distinct from classical epistemic uncertainty—characterized by persistent coexistence of hesitation and confidence induced by conflicting evidence. We propose the first formal, optimizable framework: EA-MDP. Innovatively integrating quantum state representation into Markov decision processes, we formally define EA-MDP and rigorously prove the existence of optimal policies and value functions. We further design EA-ε-greedy Q-learning, an algorithm that incorporates quantum measurement theory and probability amplitude modeling to explicitly capture belief dynamics under EA. Empirical evaluation on two-state and lattice environments demonstrates that the algorithm stably converges to optimal policies under EA conditions, significantly outperforming classical reinforcement learning baselines in both convergence stability and asymptotic performance.

Technology Category

Application Category

📝 Abstract

The complexity of online decision-making under uncertainty stems from the requirement of finding a balance between exploiting known strategies and exploring new possibilities. Naturally, the uncertainty type plays a crucial role in developing decision-making strategies that manage complexity effectively. In this paper, we focus on a specific form of uncertainty known as epistemic ambivalence (EA), which emerges from conflicting pieces of evidence or contradictory experiences. It creates a delicate interplay between uncertainty and confidence, distinguishing it from epistemic uncertainty that typically diminishes with new information. Indeed, ambivalence can persist even after additional knowledge is acquired. To address this phenomenon, we propose a novel framework, called the epistemically ambivalent Markov decision process (EA-MDP), aiming to understand and control EA in decision-making processes. This framework incorporates the concept of a quantum state from the quantum mechanics formalism, and its core is to assess the probability and reward of every possible outcome. We calculate the reward function using quantum measurement techniques and prove the existence of an optimal policy and an optimal value function in the EA-MDP framework. We also propose the EA-epsilon-greedy Q-learning algorithm. To evaluate the impact of EA on decision-making and the expedience of our framework, we study two distinct experimental setups, namely the two-state problem and the lattice problem. Our results show that using our methods, the agent converges to the optimal policy in the presence of EA.

Problem

Research questions and friction points this paper is trying to address.

Addresses decision-making under epistemic ambivalence (EA)

Proposes EA-MDP framework using quantum mechanics concepts

Develops EA-epsilon-greedy Q-learning for optimal policy convergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum-inspired reinforcement learning for epistemic ambivalence

EA-MDP framework with quantum state concepts

EA-epsilon-greedy Q-learning algorithm for optimal policies

🔎 Similar Papers

Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes