Weber-Fechner Law in Temporal Difference learning derived from Control as Inference

📅 2024-12-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional linear temporal-difference (TD) updates fail to capture biologically observed nonlinear perceptual biases—particularly the empirically grounded Weber–Fechner law—underlying reward processing in neural systems. Method: Grounded in the “control as inference” framework, we rigorously derive the Weber–Fechner law and establish a physiologically plausible logarithmic relationship between TD error and policy update magnitude. Based on this, we propose a biologically inspired nonlinear TD update mechanism wherein the update magnitude adaptively decays with value magnitude, integrated with reward shaping and penalty suppression to form a novel reward-punishment optimization paradigm. Contribution/Results: Experiments demonstrate that our approach significantly accelerates early high-reward acquisition, enabling faster task initiation and enhanced robustness against penalties in both simulated and real-robot navigation tasks.

Technology Category

Application Category

📝 Abstract
This paper investigates a novel nonlinear update rule based on temporal difference (TD) errors in reinforcement learning (RL). The update rule in the standard RL states that the TD error is linearly proportional to the degree of updates, treating all rewards equally without no bias. On the other hand, the recent biological studies revealed that there are nonlinearities in the TD error and the degree of updates, biasing policies optimistic or pessimistic. Such biases in learning due to nonlinearities are expected to be useful and intentionally leftover features in biological learning. Therefore, this research explores a theoretical framework that can leverage the nonlinearity between the degree of the update and TD errors. To this end, we focus on a control as inference framework, since it is known as a generalized formulation encompassing various RL and optimal control methods. In particular, we investigate the uncomputable nonlinear term needed to be approximately excluded in the derivation of the standard RL from control as inference. By analyzing it, Weber-Fechner law (WFL) is found, namely, perception (a.k.a. the degree of updates) in response to stimulus change (a.k.a. TD error) is attenuated by increase in the stimulus intensity (a.k.a. the value function). To numerically reveal the utilities of WFL on RL, we then propose a practical implementation using a reward-punishment framework and modifying the definition of optimality. Analysis of this implementation reveals that two utilities can be expected i) to increase rewards to a certain level early, and ii) to sufficiently suppress punishment. We finally investigate and discuss the expected utilities through simulations and robot experiments. As a result, the proposed RL algorithm with WFL shows the expected utilities that accelerate the reward-maximizing startup and continue to suppress punishments during learning.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
Nonlinear Update Rules
Biological Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nonlinear Update Rule
Weber-Fechner Law
Reward-Punishment Adjustment
🔎 Similar Papers
No similar papers found.
K
Keiichiro Takahashi
Division of Information Science, Nara Institute of Science and Technology, Ikoma, Nara, 630-0192, Japan.
Taisuke Kobayashi
Taisuke Kobayashi
National Institute of Informatics
RoboticsMachine learningAutonomous systems
T
Tomoya Yamanokuchi
Division of Information Science, Nara Institute of Science and Technology, Ikoma, Nara, 630-0192, Japan.
Takamitsu Matsubara
Takamitsu Matsubara
Nara Institute of Science and Technology
Robot LearningMachine LearningReinforcement LearningRobotics