AdaptPNP: Integrating Prehensile and Non-Prehensile Skills for Adaptive Robotic Manipulation

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of unifying planning for both grasping (P) and non-grasping (NP) manipulation tasks in complex, dynamic environments. We propose AdaptPNP, a novel framework that integrates vision-language models (VLMs) with object-centric intermediate representations and digital twin technology to achieve high-precision object pose estimation and real-time scene modeling. Leveraging VLM-driven semantic-geometric joint reasoning, AdaptPNP autonomously determines P/NP action types, selects appropriate primitive skills, and generates online-replannable, multi-step hybrid policies. A control module maps high-level plans to low-level execution commands and enables closed-loop adaptation via real-time sensory feedback. Extensive experiments in simulation and on physical robotic platforms demonstrate that AdaptPNP significantly improves task success rates and cross-environment generalization. The framework establishes a scalable, end-to-end paradigm for general-purpose dexterous manipulation.

Technology Category

Application Category

📝 Abstract
Non-prehensile (NP) manipulation, in which robots alter object states without forming stable grasps (for example, pushing, poking, or sliding), significantly broadens robotic manipulation capabilities when grasping is infeasible or insufficient. However, enabling a unified framework that generalizes across different tasks, objects, and environments while seamlessly integrating non-prehensile and prehensile (P) actions remains challenging: robots must determine when to invoke NP skills, select the appropriate primitive for each context, and compose P and NP strategies into robust, multi-step plans. We introduce ApaptPNP, a vision-language model (VLM)-empowered task and motion planning framework that systematically selects and combines P and NP skills to accomplish diverse manipulation objectives. Our approach leverages a VLM to interpret visual scene observations and textual task descriptions, generating a high-level plan skeleton that prescribes the sequence and coordination of P and NP actions. A digital-twin based object-centric intermediate layer predicts desired object poses, enabling proactive mental rehearsal of manipulation sequences. Finally, a control module synthesizes low-level robot commands, with continuous execution feedback enabling online task plan refinement and adaptive replanning through the VLM. We evaluate ApaptPNP across representative P&NP hybrid manipulation tasks in both simulation and real-world environments. These results underscore the potential of hybrid P&NP manipulation as a crucial step toward general-purpose, human-level robotic manipulation capabilities. Project Website: https://sites.google.com/view/adaptpnp/home
Problem

Research questions and friction points this paper is trying to address.

Integrating prehensile and non-prehensile manipulation skills in robotics
Creating unified framework that generalizes across tasks and environments
Systematically selecting and combining grasping and non-grasping actions
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM interprets scenes and tasks for planning
Digital twin predicts poses for mental rehearsal
Feedback enables adaptive replanning during execution
🔎 Similar Papers
No similar papers found.
J
Jinxuan Zhu
National University of Singapore
Chenrui Tie
Chenrui Tie
National University of Singapore
roboticsreinforcement learning
X
Xinyi Cao
East China Normal University
Y
Yuran Wang
Peking University
Jingxiang Guo
Jingxiang Guo
National University of Singapore
Manipulation
Z
Zixuan Chen
National University of Singapore
H
Haonan Chen
National University of Singapore
Junting Chen
Junting Chen
Assistant Professor in School of Science and Engineering, Chinese University of Hong Kong, Shenzhen
Signal processingoptimizationstatistical learningwireless communicationslocalization
Y
Yangyu Xiao
RoboScience
Ruihai Wu
Ruihai Wu
Peking University
computer visionrobotics
L
Lin Shao
National University of Singapore