Towards Versatile Humanoid Table Tennis: Unified Reinforcement Learning with Prediction Augmentation

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Humanoid robots face significant challenges in table tennis tasks due to the difficulty of jointly achieving rapid visual perception, proactive whole-body motion, and agile footwork coordination. To address this, this paper proposes an end-to-end reinforcement learning framework that directly maps ball-position observations to 23-degree-of-freedom full-body joint actions. The method integrates a lightweight learned trajectory predictor with a physics-informed dense reward mechanism, enabling tight coupling of perception, decision-making, and control. This design substantially improves policy training efficiency and generalization capability, achieving— for the first time—zero-shot transfer to the physical Booster T1 humanoid robot. In simulation, the policy achieves ≥96% shot success rate across diverse serve types and ≥92% rally success rate. On hardware, it successfully executes coordinated locomotion and sub-millisecond racket responses, demonstrating real-time, robust performance in physical deployment.

Technology Category

Application Category

📝 Abstract
Humanoid table tennis (TT) demands rapid perception, proactive whole-body motion, and agile footwork under strict timing -- capabilities that remain difficult for unified controllers. We propose a reinforcement learning framework that maps ball-position observations directly to whole-body joint commands for both arm striking and leg locomotion, strengthened by predictive signals and dense, physics-guided rewards. A lightweight learned predictor, fed with recent ball positions, estimates future ball states and augments the policy's observations for proactive decision-making. During training, a physics-based predictor supplies precise future states to construct dense, informative rewards that lead to effective exploration. The resulting policy attains strong performance across varied serve ranges (hit rate $geq$ 96% and success rate $geq$ 92%) in simulations. Ablation studies confirm that both the learned predictor and the predictive reward design are critical for end-to-end learning. Deployed zero-shot on a physical Booster T1 humanoid with 23 revolute joints, the policy produces coordinated lateral and forward-backward footwork with accurate, fast returns, suggesting a practical path toward versatile, competitive humanoid TT.
Problem

Research questions and friction points this paper is trying to address.

Develop unified reinforcement learning for humanoid table tennis control
Enable proactive decision-making through predictive ball state augmentation
Achieve coordinated whole-body motion and agile footwork in real robots
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning maps ball observations to joint commands
Learned predictor estimates future ball states for proactive decisions
Physics-based predictor enables dense rewards for effective exploration
🔎 Similar Papers
No similar papers found.