Data-driven simulator of multi-animal behavior with unknown dynamics via offline and online reinforcement learning

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of high-fidelity modeling and optimization of multi-agent animal behavior under unknown real-world biomechanical dynamics. We propose a deep reinforcement learning framework integrating data-driven simulation with counterfactual reasoning. Methodologically, we explicitly encode motion variables from incomplete dynamical models as RL action space components and introduce a trajectory-distance-based pseudo-reward mechanism, enabling stable training without prior dynamical knowledge. Further, we unify offline/online RL, imitation learning, state alignment, and counterfactual inference to support cross-species behavioral replication (fruit fly, salamander, silkworm moth) and counterfactual trajectory generation. Experiments demonstrate significant improvements: 32.7% reduction in trajectory reconstruction error and 2.1× acceleration in reward convergence. The framework establishes a novel, interpretable, and intervention-capable simulation paradigm for computational neuroethology and embodied AI.

Technology Category

Application Category

📝 Abstract
Simulators of animal movements play a valuable role in studying behavior. Advances in imitation learning for robotics have expanded possibilities for reproducing human and animal movements. A key challenge for realistic multi-animal simulation in biology is bridging the gap between unknown real-world transition models and their simulated counterparts. Because locomotion dynamics are seldom known, relying solely on mathematical models is insufficient; constructing a simulator that both reproduces real trajectories and supports reward-driven optimization remains an open problem. We introduce a data-driven simulator for multi-animal behavior based on deep reinforcement learning and counterfactual simulation. We address the ill-posed nature of the problem caused by high degrees of freedom in locomotion by estimating movement variables of an incomplete transition model as actions within an RL framework. We also employ a distance-based pseudo-reward to align and compare states between cyber and physical spaces. Validated on artificial agents, flies, newts, and silkmoth, our approach achieves higher reproducibility of species-specific behaviors and improved reward acquisition compared with standard imitation and RL methods. Moreover, it enables counterfactual behavior prediction in novel experimental settings and supports multi-individual modeling for flexible what-if trajectory generation, suggesting its potential to simulate and elucidate complex multi-animal behaviors.
Problem

Research questions and friction points this paper is trying to address.

Simulating multi-animal behavior with unknown dynamics
Bridging real-world transitions and simulated models
Reproducing trajectories while supporting reward optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses offline and online reinforcement learning
Estimates movement variables as RL actions
Employs distance-based pseudo-reward for alignment
🔎 Similar Papers
No similar papers found.
K
Keisuke Fujii
Graduate School of Informatics, Nagoya University, Japan
Kazushi Tsutsui
Kazushi Tsutsui
The University of Tokyo
Y
Yu Teshima
Project team for SIP , Japan Agency for Marine-Earth Science and Technology, Japan
Makoto Itoh
Makoto Itoh
University of Tsukuba
cognitive systems engineeringADAS
Naoya Takeishi
Naoya Takeishi
UTokyo
machine learningdynamical systems
N
Nozomi Nishiumi
Graduate School of Science and Technology, Niigata University, Japan
R
Ryoya Tanaka
Graduate School of Science, Nagoya University, Japan
Shunsuke Shigaki
Shunsuke Shigaki
National Institute of Informatics
Bio-inspired robotics
Yoshinobu Kawahara
Yoshinobu Kawahara
The University of Osaka & RIKEN Center for Advanced Intelligence Project
Machine LearningDynamical SystemsNonlinear Dynamics