Problem
Research questions and friction points this paper is trying to address.
Convergence of Q-learning with agent state in POMDPs
Analyzing regularized Q-learning without belief states
Establishing convergence to regularized MDP fixed point
Innovation
Methods, ideas, or system contributions that make the work stand out.
Agent-state-based Q-learning with regularization
Convergence to regularized MDP fixed point
Periodic policy variant with same analysis