🤖 AI Summary
This study addresses auditory-driven robot head orientation control in realistic reverberant environments. We propose an end-to-end reinforcement learning method that integrates binaural acoustic signals with a Deep Q-Network (DQN). By constructing a simulated acoustic environment spanning anechoic to strongly reverberant conditions, we systematically investigate the generalization behavior of audio-driven deep reinforcement learning (DRL) policies. Our key finding—novel in the literature—is that policies trained under medium-to-high reverberation generalize downward to low-reverberation scenarios, but not vice versa, revealing a directional constraint imposed by reverberation strength on policy transfer. Experiments show near-perfect orientation accuracy (≈100%) in anechoic conditions and significant performance gains over random baselines under medium and high reverberation. These results empirically validate two core conclusions: (1) inherent robustness to reverberation and (2) asymmetric generalization. The work establishes an interpretable acoustic adaptation paradigm for auditory-motor coupling in embodied intelligence.
📝 Abstract
Although deep reinforcement learning (DRL) approaches in audio signal processing have seen substantial progress in recent years, audio-driven DRL for tasks such as navigation, gaze control and head-orientation control in the context of human-robot interaction have received little attention. Here, we propose an audio-driven DRL framework in which we utilise deep Q-learning to develop an autonomous agent that orients towards a talker in the acoustic environment based on stereo speech recordings. Our results show that the agent learned to perform the task at a near perfect level when trained on speech segments in anechoic environments (that is, without reverberation). The presence of reverberation in naturalistic acoustic environments affected the agent's performance, although the agent still substantially outperformed a baseline, randomly acting agent. Finally, we quantified the degree of generalization of the proposed DRL approach across naturalistic acoustic environments. Our experiments revealed that policies learned by agents trained on medium or high reverb environments generalized to low reverb environments, but policies learned by agents trained on anechoic or low reverb environments did not generalize to medium or high reverb environments. Taken together, this study demonstrates the potential of audio-driven DRL for tasks such as head-orientation control and highlights the need for training strategies that enable robust generalization across environments for real-world audio-driven DRL applications.