🤖 AI Summary
This work addresses the unified modeling challenge of real-time motion generation, intent recognition, and confidence estimation in human–robot collaboration. We propose CERNet, a novel architecture integrating class embedding with a hierarchical Predictive Coding Recurrent Neural Network (PC-RNN). Its key contributions are threefold: (1) it is the first to unify generative and discriminative inference end-to-end by jointly leveraging class embeddings and predictive coding; (2) it inherently quantifies model uncertainty via internal prediction errors, eliminating the need for auxiliary confidence modules; and (3) it supports online learning and robust, interference-resilient motion reproduction. Evaluated on 26 kinesthetic-teaching letter tasks, CERNet reduces trajectory reconstruction error by 76% compared to a single-layer baseline, and achieves online behavior recognition accuracies of 68% (Top-1) and 81% (Top-2), demonstrating both effectiveness and robustness.
📝 Abstract
Robots interacting with humans must not only generate learned movements in real-time, but also infer the intent behind observed behaviors and estimate the confidence of their own inferences. This paper proposes a unified model that achieves all three capabilities within a single hierarchical predictive-coding recurrent neural network (PC-RNN) equipped with a class embedding vector, CERNet, which leverages a dynamically updated class embedding vector to unify motor generation and recognition. The model operates in two modes: generation and inference. In the generation mode, the class embedding constrains the hidden state dynamics to a class-specific subspace; in the inference mode, it is optimized online to minimize prediction error, enabling real-time recognition. Validated on a humanoid robot across 26 kinesthetically taught alphabets, our hierarchical model achieves 76% lower trajectory reproduction error than a parameter-matched single-layer baseline, maintains motion fidelity under external perturbations, and infers the demonstrated trajectory class online with 68% Top-1 and 81% Top-2 accuracy. Furthermore, internal prediction errors naturally reflect the model's confidence in its recognition. This integration of robust generation, real-time recognition, and intrinsic uncertainty estimation within a compact PC-RNN framework offers a compact and extensible approach to motor memory in physical robots, with potential applications in intent-sensitive human-robot collaboration.