CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reinforcement learning in hybrid discrete-continuous action spaces remains constrained by limited policy expressiveness and poor scalability to high dimensions. This work addresses the challenge by formulating it as a fully cooperative game and introducing a collaborative diffusion policy framework. The approach employs two agents—one utilizing a discrete diffusion policy and the other a continuous diffusion policy—whose actions are coordinated through conditional dependency modeling and a sequential update mechanism to prevent policy conflicts. To enhance scalability, a Q-function-guided low-dimensional discrete action codebook is designed. Evaluated across multiple benchmark tasks with hybrid action spaces, the proposed method significantly outperforms existing state-of-the-art approaches, achieving up to a 19.3% improvement in success rate.

Technology Category

Application Category

📝 Abstract
Hybrid action space, which combines discrete choices and continuous parameters, is prevalent in domains such as robot control and game AI. However, efficiently modeling and optimizing hybrid discrete-continuous action space remains a fundamental challenge, mainly due to limited policy expressiveness and poor scalability in high-dimensional settings. To address this challenge, we view the hybrid action space problem as a fully cooperative game and propose a \textbf{Cooperative Hybrid Diffusion Policies (CHDP)} framework to solve it. CHDP employs two cooperative agents that leverage a discrete and a continuous diffusion policy, respectively. The continuous policy is conditioned on the discrete action's representation, explicitly modeling the dependency between them. This cooperative design allows the diffusion policies to leverage their expressiveness to capture complex distributions in their respective action spaces. To mitigate the update conflicts arising from simultaneous policy updates in this cooperative setting, we employ a sequential update scheme that fosters co-adaptation. Moreover, to improve scalability when learning in high-dimensional discrete action space, we construct a codebook that embeds the action space into a low-dimensional latent space. This mapping enables the discrete policy to learn in a compact, structured space. Finally, we design a Q-function-based guidance mechanism to align the codebook's embeddings with the discrete policy's representation during training. On challenging hybrid action benchmarks, CHDP outperforms the state-of-the-art method by up to $19.3\%$ in success rate.
Problem

Research questions and friction points this paper is trying to address.

hybrid action space
reinforcement learning
parameterized action space
policy expressiveness
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Action Space
Diffusion Policy
Cooperative Reinforcement Learning
Codebook Embedding
Sequential Policy Update
🔎 Similar Papers
No similar papers found.
Bingyi Liu
Bingyi Liu
Professor, Department of CS and AI, Wuhan University of Technology
Internet of VehiclesEdge ComputingAutonomous VehiclesIntelligent Transportation Systems
J
Jinbo He
School of Computer Science and Artificial Intelligence, Wuhan University of Technology
H
Haiyong Shi
School of Computer Science and Artificial Intelligence, Wuhan University of Technology
E
Enshu Wang
School of Cyber Science and Engineering, Wuhan University
W
Weizhen Han
School of Computer Science and Artificial Intelligence, Wuhan University of Technology
J
Jingxiang Hao
School of Cyber Science and Engineering, Wuhan University
P
Peixi Wang
School of Computer Science and Artificial Intelligence, Wuhan University of Technology
Z
Zhuangzhuang Zhang
Department of Computer Science, City University of Hong Kong