Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing imitation learning methods decouple perception and action, neglecting their causal, reciprocal relationship and thereby limiting policy adaptability to dynamic environments. To address this, we propose Action-Guided Diffusion Policy (DP-AG), the first framework to integrate diffusion models into joint perception-action modeling. DP-AG formalizes bidirectional dynamic interaction between perceptual representations and action execution via probabilistic latent dynamics; it employs vector-Jacobian products to inject stochastic forces guiding latent state evolution and introduces a cycle-consistency contrastive loss for bidirectional optimization. The method unifies variational inference, stochastic differential equation (SDE) modeling, and gradient-aware noise-prediction network architecture to realize an end-to-end perception–action closed loop. Evaluated on simulation benchmarks and real-world UR5 robotic manipulation tasks, DP-AG significantly outperforms state-of-the-art methods, demonstrating superior effectiveness and generalization in adaptive operational control.

Technology Category

Application Category

📝 Abstract
Existing imitation learning methods decouple perception and action, which overlooks the causal reciprocity between sensory representations and action execution that humans naturally leverage for adaptive behaviors. To bridge this gap, we introduce Action--Guided Diffusion Policy (DP--AG), a unified representation learning that explicitly models a dynamic interplay between perception and action through probabilistic latent dynamics. DP--AG encodes latent observations into a Gaussian posterior via variational inference and evolves them using an action-guided SDE, where the Vector-Jacobian Product (VJP) of the diffusion policy's noise predictions serves as a structured stochastic force driving latent updates. To promote bidirectional learning between perception and action, we introduce a cycle--consistent contrastive loss that organizes the gradient flow of the noise predictor into a coherent perception--action loop, enforcing mutually consistent transitions in both latent updates and action refinements. Theoretically, we derive a variational lower bound for the action-guided SDE, and prove that the contrastive objective enhances continuity in both latent and action trajectories. Empirically, DP--AG significantly outperforms state--of--the--art methods across simulation benchmarks and real-world UR5 manipulation tasks. As a result, our DP--AG offers a promising step toward bridging biological adaptability and artificial policy learning.
Problem

Research questions and friction points this paper is trying to address.

Modeling perception-action reciprocity for adaptive policies
Unifying representation learning with probabilistic latent dynamics
Enhancing bidirectional learning through cycle-consistent contrastive loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-guided diffusion policy models perception-action interplay
Cycle-consistent contrastive loss enables bidirectional learning
Variational inference with action-guided SDE drives latent updates
🔎 Similar Papers
No similar papers found.
J
Jing Wang
University of Alberta
W
Weiting Peng
Huazhong University of Science and Technology
J
Jing Tang
Huazhong University of Science and Technology
Zeyu Gong
Zeyu Gong
Huazhong University of Science and Technology
visual servoingclimbing robot
Xihua Wang
Xihua Wang
Renmin University of China
B
Bo Tao
Huazhong University of Science and Technology
L
Li Cheng
University of Alberta