Fitting Reinforcement Learning Model to Behavioral Data under Bandits

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inverse reinforcement learning problem for behavioral-data-driven multi-armed bandit models, aiming to accurately characterize sequential decision-making mechanisms in humans and animals. To overcome key bottlenecks of conventional fitting methods—namely high computational complexity and difficulty converging in non-convex optimization—we propose a general convex relaxation-based mathematical optimization framework, with rigorous theoretical analysis of its convexifiability conditions and statistical properties. The proposed method achieves substantial computational speedup (averaging several-fold acceleration) while maintaining fitting accuracy comparable to state-of-the-art approaches. We systematically validate the framework across multiple simulated decision-making environments. Furthermore, we release a lightweight, open-source Python toolkit to facilitate rapid model deployment and cross-experiment reproducibility.

Technology Category

Application Category

📝 Abstract
We consider the problem of fitting a reinforcement learning (RL) model to some given behavioral data under a multi-armed bandit environment. These models have received much attention in recent years for characterizing human and animal decision making behavior. We provide a generic mathematical optimization problem formulation for the fitting problem of a wide range of RL models that appear frequently in scientific research applications, followed by a detailed theoretical analysis of its convexity properties. Based on the theoretical results, we introduce a novel solution method for the fitting problem of RL models based on convex relaxation and optimization. Our method is then evaluated in several simulated bandit environments to compare with some benchmark methods that appear in the literature. Numerical results indicate that our method achieves comparable performance to the state-of-the-art, while significantly reducing computation time. We also provide an open-source Python package for our proposed method to empower researchers to apply it in the analysis of their datasets directly, without prior knowledge of convex optimization.
Problem

Research questions and friction points this paper is trying to address.

Fitting reinforcement learning models to behavioral data in multi-armed bandit environments
Developing convex optimization methods for efficient RL model parameter estimation
Providing computationally efficient solution for behavioral data analysis without convex optimization expertise
Innovation

Methods, ideas, or system contributions that make the work stand out.

Convex relaxation for RL model fitting
Optimization method reducing computation time
Open-source Python package for researchers
🔎 Similar Papers
No similar papers found.
H
Hao Zhu
IMBIT//BrainLinks-BrainTools, Department of Computer Science, University of Freiburg
J
Jasper Hoffmann
IMBIT//BrainLinks-BrainTools, Department of Computer Science, University of Freiburg
Baohe Zhang
Baohe Zhang
PhD Student at University of Freiburg
Reinforcement LearningAutoML
Joschka Boedecker
Joschka Boedecker
Professor of Computer Science, University of Freiburg, Germany
Artificial IntelligenceMachine LearningReinforcement LearningRobotics