ADM-DP: Adaptive Dynamic Modality Diffusion Policy through Vision-Tactile-Graph Fusion for Multi-Agent Manipulation

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of coordination control, grasp stability, and collision avoidance in multi-agent robotic systems operating in shared environments. The authors propose an adaptive dynamic modality fusion strategy that integrates an enhanced visual encoder—employing FiLM modulation to fuse RGB and point cloud data—with tactile-guided grasping based on force-sensing resistor (FSR) feedback, graph-structured modeling using tool-center-point (TCP) positions to construct a graph neural network, and an adaptive modality attention mechanism for task-context-aware multimodal coordination. Decoupled training is further introduced to improve scalability. Evaluated across seven multi-agent tasks, the method outperforms state-of-the-art approaches by 12–25%, demonstrating particularly significant gains in scenarios with strong multimodal dependencies, thereby validating its robustness and effectiveness.

Technology Category

Application Category

📝 Abstract
Multi-agent robotic manipulation remains challenging due to the combined demands of coordination, grasp stability, and collision avoidance in shared workspaces. To address these challenges, we propose the Adaptive Dynamic Modality Diffusion Policy (ADM-DP), a framework that integrates vision, tactile, and graph-based (multi-agent pose) modalities for coordinated control. ADM-DP introduces four key innovations. First, an enhanced visual encoder merges RGB and point-cloud features via Feature-wise Linear Modulation (FiLM) modulation to enrich perception. Second, a tactile-guided grasping strategy uses Force-Sensitive Resistor (FSR) feedback to detect insufficient contact and trigger corrective grasp refinement, improving grasp stability. Third, a graph-based collision encoder leverages shared tool center point (TCP) positions of multiple agents as structured kinematic context to maintain spatial awareness and reduce inter-agent interference. Fourth, an Adaptive Modality Attention Mechanism (AMAM) dynamically re-weights modalities according to task context, enabling flexible fusion. For scalability and modularity, a decoupled training paradigm is employed in which agents learn independent policies while sharing spatial information. This maintains low interdependence between agents while retaining collective awareness. Across seven multi-agent tasks, ADM-DP achieves 12-25% performance gains over state-of-the-art baselines. Ablation studies show the greatest improvements in tasks requiring multiple sensory modalities, validating our adaptive fusion strategy and demonstrating its robustness for diverse manipulation scenarios.
Problem

Research questions and friction points this paper is trying to address.

multi-agent manipulation
grasp stability
collision avoidance
coordination
shared workspace
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Modality Fusion
Vision-Tactile-Graph Integration
Multi-Agent Manipulation
Dynamic Attention Mechanism
Decoupled Policy Learning
🔎 Similar Papers
No similar papers found.
E
Enyi Wang
Department of Bioengineering, Imperial-X Initiative, Imperial College London, London, United Kingdom
Wen Fan
Wen Fan
University of California, Berkeley
Nanotechnology - Vanadium dioxide - 2D materials
Dandan Zhang
Dandan Zhang
Imperial College London
RoboticsAI