🤖 AI Summary
This work addresses the lack of systematic guidance in designing reward functions for reinforcement learning (RL) in biomechanical human simulation. Methodologically, we develop a biomechanical simulation framework based on the Proximal Policy Optimization (PPO) algorithm, integrating a choice-reaction task paradigm and trajectory behavioral analysis to quantitatively disentangle the functional roles of three reward components: effort minimization, task completion, and target proximity. Our key contribution is the first identification of their interaction principles, leading to a novel “completion + proximity” dual-core reward paradigm: synergistic integration of these two components is essential for task success, whereas effort regularization is optional but helps suppress aberrant motions. Experiments demonstrate that this paradigm significantly improves trajectory fidelity and stability while lowering the design barrier for non-RL experts—advancing biomechanical simulation toward high-fidelity, deployable solutions.
📝 Abstract
Biomechanical models allow for diverse simulations of user movements in interaction. Their performance depends critically on the careful design of reward functions, yet the interplay between reward components and emergent behaviours remains poorly understood. We investigate what makes a model"breathe"by systematically analysing the impact of rewarding effort minimisation, task completion, and target proximity on movement trajectories. Using a choice reaction task as a test-bed, we find that a combination of completion bonus and proximity incentives is essential for task success. Effort terms are optional, but can help avoid irregularities if scaled appropriately. Our work offers practical insights for HCI designers to create realistic simulations without needing deep reinforcement learning expertise, advancing the use of simulations as a powerful tool for interaction design and evaluation in HCI.