🤖 AI Summary
Humanoid robots face significant challenges in table tennis tasks due to the difficulty of jointly achieving rapid visual perception, proactive whole-body motion, and agile footwork coordination. To address this, this paper proposes an end-to-end reinforcement learning framework that directly maps ball-position observations to 23-degree-of-freedom full-body joint actions. The method integrates a lightweight learned trajectory predictor with a physics-informed dense reward mechanism, enabling tight coupling of perception, decision-making, and control. This design substantially improves policy training efficiency and generalization capability, achieving— for the first time—zero-shot transfer to the physical Booster T1 humanoid robot. In simulation, the policy achieves ≥96% shot success rate across diverse serve types and ≥92% rally success rate. On hardware, it successfully executes coordinated locomotion and sub-millisecond racket responses, demonstrating real-time, robust performance in physical deployment.
📝 Abstract
Humanoid table tennis (TT) demands rapid perception, proactive whole-body motion, and agile footwork under strict timing -- capabilities that remain difficult for unified controllers. We propose a reinforcement learning framework that maps ball-position observations directly to whole-body joint commands for both arm striking and leg locomotion, strengthened by predictive signals and dense, physics-guided rewards. A lightweight learned predictor, fed with recent ball positions, estimates future ball states and augments the policy's observations for proactive decision-making. During training, a physics-based predictor supplies precise future states to construct dense, informative rewards that lead to effective exploration. The resulting policy attains strong performance across varied serve ranges (hit rate $geq$ 96% and success rate $geq$ 92%) in simulations. Ablation studies confirm that both the learned predictor and the predictive reward design are critical for end-to-end learning. Deployed zero-shot on a physical Booster T1 humanoid with 23 revolute joints, the policy produces coordinated lateral and forward-backward footwork with accurate, fast returns, suggesting a practical path toward versatile, competitive humanoid TT.