π€ AI Summary
This work addresses the challenge of enabling long-term deployed autonomous social assistive robots to effectively perceive and respond to diverse activities of daily living (ADL), particularly when encountering unknown or atypically executed actions. To this end, the authors propose POVNet+, a novel multimodal deep learning architecture that introduces, for the first time, a dual-embedding space mechanism for both ADLs and constituent actions. This framework jointly models known, unknown, and non-canonical ADLs while integrating user state estimation to proactively trigger context-appropriate assistive behaviors. Experimental results demonstrate that the proposed method outperforms existing approaches in classification accuracy and successfully identifies a wide range of ADL types in real-world, complex environments, thereby enabling robust and proactive humanβrobot interaction.
π Abstract
A significant barrier to the long-term deployment of autonomous socially assistive robots is their inability to both perceive and assist with multiple activities of daily living (ADLs). In this paper, we present the first multimodal deep learning architecture, POVNet+, for multi-activity recognition for socially assistive robots to proactively initiate assistive behaviors. Our novel architecture introduces the use of both ADL and motion embedding spaces to uniquely distinguish between a known ADL being performed, a new unseen ADL, or a known ADL being performed atypically in order to assist people in real scenarios. Furthermore, we apply a novel user state estimation method to the motion embedding space to recognize new ADLs while monitoring user performance. This ADL perception information is used to proactively initiate robot assistive interactions. Comparison experiments with state-of-the-art human activity recognition methods show our POVNet+ method has higher ADL classification accuracy. Human-robot interaction experiments in a cluttered living environment with multiple users and the socially assistive robot Leia using POVNet+ demonstrate the ability of our multi-modal ADL architecture in successfully identifying different seen and unseen ADLs, and ADLs being performed atypically, while initiating appropriate assistive human-robot interactions.