DSPv2: Improved Dense Policy for Effective and Generalizable Whole-body Mobile Manipulation

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Addressing key challenges in whole-body mobile manipulation—including complex perception modeling, discontinuous motion generation, and poor cross-environment generalization—this paper introduces DSPv2, a novel dense policy architecture. DSPv2 pioneers the intra-policy deep fusion of 3D spatial features with multi-view 2D semantic features, incorporating a 3D–2D feature alignment encoding mechanism and a multimodal perception fusion module to enable end-to-end whole-body coordination. Compared to state-of-the-art imitation learning approaches, DSPv2 achieves a +21.3% improvement in task success rate and significantly enhances cross-scenario generalization. It demonstrates superior robustness and practicality across diverse real-world environments and complex manipulation tasks. This work establishes a new paradigm for embodied agents to perform high-precision, adaptive whole-body manipulation in open, unstructured settings.

Technology Category

Application Category

📝 Abstract

Learning whole-body mobile manipulation via imitation is essential for generalizing robotic skills to diverse environments and complex tasks. However, this goal is hindered by significant challenges, particularly in effectively processing complex observation, achieving robust generalization, and generating coherent actions. To address these issues, we propose DSPv2, a novel policy architecture. DSPv2 introduces an effective encoding scheme that aligns 3D spatial features with multi-view 2D semantic features. This fusion enables the policy to achieve broad generalization while retaining the fine-grained perception necessary for precise control. Furthermore, we extend the Dense Policy paradigm to the whole-body mobile manipulation domain, demonstrating its effectiveness in generating coherent and precise actions for the whole-body robotic platform. Extensive experiments show that our method significantly outperforms existing approaches in both task performance and generalization ability. Project page is available at: https://selen-suyue.github.io/DSPv2Net/.

Problem

Research questions and friction points this paper is trying to address.

Learning whole-body mobile manipulation via imitation

Effectively processing complex observation and achieving robust generalization

Generating coherent and precise actions for whole-body robotic platform

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses 3D spatial with multi-view 2D semantic features

Extends Dense Policy to whole-body mobile manipulation

Generates coherent precise actions for robotic platform

🔎 Similar Papers

M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes

2024-10-09arXiv.orgCitations: 2