Learning to Ball: Composing Policies for Long-Horizon Basketball Moves

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-stage, long-horizon basketball motion control, challenges include ambiguous subtask boundaries, non-overlapping state spaces, and ill-defined objectives for transitional skills. To address these, this paper proposes a high-level soft-routing-based policy integration framework. It unifies heterogeneous motor skills via a learnable soft router that dynamically coordinates sub-policies under end-to-end reinforcement learning—without requiring reference trajectories. We further enhance the skill-chain architecture and mixture-of-experts mechanism to enable seamless policy switching and cooperative execution. Experiments demonstrate robust performance across fundamental skills and complex transitions (e.g., dribble direction change followed by jump shot). The simulated agent autonomously controls the ball and reliably executes extended, multi-step sequences. Our approach significantly outperforms conventional hard-switching skill chains and standard mixture-of-experts baselines in both stability and task completion fidelity.

Technology Category

Application Category

📝 Abstract
Learning a control policy for a multi-phase, long-horizon task, such as basketball maneuvers, remains challenging for reinforcement learning approaches due to the need for seamless policy composition and transitions between skills. A long-horizon task typically consists of distinct subtasks with well-defined goals, separated by transitional subtasks with unclear goals but critical to the success of the entire task. Existing methods like the mixture of experts and skill chaining struggle with tasks where individual policies do not share significant commonly explored states or lack well-defined initial and terminal states between different phases. In this paper, we introduce a novel policy integration framework to enable the composition of drastically different motor skills in multi-phase long-horizon tasks with ill-defined intermediate states. Based on that, we further introduce a high-level soft router to enable seamless and robust transitions between the subtasks. We evaluate our framework on a set of fundamental basketball skills and challenging transitions. Policies trained by our approach can effectively control the simulated character to interact with the ball and accomplish the long-horizon task specified by real-time user commands, without relying on ball trajectory references.
Problem

Research questions and friction points this paper is trying to address.

Composing policies for multi-phase basketball maneuvers
Enabling seamless transitions between distinct motor skills
Addressing challenges in long-horizon reinforcement learning tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel policy integration framework for multi-phase tasks
High-level soft router enabling seamless transitions
Composing drastically different motor skills without references
🔎 Similar Papers
No similar papers found.