Generative Artificial Intelligence in Robotic Manipulation: A Survey

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Robot manipulation faces three fundamental bottlenecks: data scarcity, difficulty in long-horizon task planning, and weak multimodal reasoning. To address these, this paper proposes the first three-tier generative AI framework for robotic manipulation: (1) a foundational layer for synthetic data and reward generation; (2) a middleware layer for joint language, vision, and state modeling; and (3) a policy layer for grasp and trajectory generation. We systematically benchmark GANs, VAEs, diffusion models, normalizing flows, and autoregressive models across manipulation tasks, delineating their respective applicability boundaries. Drawing on over 100 state-of-the-art works, we rigorously analyze performance limits in data augmentation, cross-modal instruction grounding, and embodied policy learning. Furthermore, we introduce AwesomeGAIManipulation—an open-source resource repository integrating curated papers, benchmarks, and code—to accelerate community progress in generative AI–driven robotics.

Technology Category

Application Category

📝 Abstract
This survey provides a comprehensive review on recent advancements of generative learning models in robotic manipulation, addressing key challenges in the field. Robotic manipulation faces critical bottlenecks, including significant challenges in insufficient data and inefficient data acquisition, long-horizon and complex task planning, and the multi-modality reasoning ability for robust policy learning performance across diverse environments. To tackle these challenges, this survey introduces several generative model paradigms, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), diffusion models, probabilistic flow models, and autoregressive models, highlighting their strengths and limitations. The applications of these models are categorized into three hierarchical layers: the Foundation Layer, focusing on data generation and reward generation; the Intermediate Layer, covering language, code, visual, and state generation; and the Policy Layer, emphasizing grasp generation and trajectory generation. Each layer is explored in detail, along with notable works that have advanced the state of the art. Finally, the survey outlines future research directions and challenges, emphasizing the need for improved efficiency in data utilization, better handling of long-horizon tasks, and enhanced generalization across diverse robotic scenarios. All the related resources, including research papers, open-source data, and projects, are collected for the community in https://github.com/GAI4Manipulation/AwesomeGAIManipulation
Problem

Research questions and friction points this paper is trying to address.

Addresses insufficient data and inefficient data acquisition in robotic manipulation.
Tackles long-horizon and complex task planning challenges in robotics.
Improves multi-modality reasoning for robust policy learning across diverse environments.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes GANs, VAEs, diffusion models for robotics.
Focuses on data, reward, and policy generation.
Addresses data scarcity, task complexity, generalization.
🔎 Similar Papers
No similar papers found.
K
Kun Zhang
LimX Dynamics, Shenzhen, China; The Hong Kong University of Science and Technology, Hong Kong, China
Peng Yun
Peng Yun
Ph.D. in CSE, HKUST
3D PerceptionIncremental LearningBayesian Neural NetworksCloud Robotics
J
Jun Cen
The Hong Kong University of Science and Technology, Hong Kong, China
Junhao Cai
Junhao Cai
Shanghai AI Lab, HKUST
RoboticsComputer Vision
Didi Zhu
Didi Zhu
Imperial College London
Multi-Modal LLMsOut of Distribution Generalization
Hangjie Yuan
Hangjie Yuan
Alibaba DAMO | ZJU | MMLab@NTU
Generative ModelsMultimodal ModelsFoundation ModelsVideo Understanding
C
Chao Zhao
The Hong Kong University of Science and Technology, Hong Kong, China
T
Tao Feng
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Michael Yu Wang
Michael Yu Wang
Chair Professor & Dean, Great Bay University, China
RoboticsTopology OptimizationAdditive Manufacturing
Qifeng Chen
Qifeng Chen
HKUST
Computational PhotographyImage SynthesisGenerative AIAutonomous DrivingEmbodied AI
J
Jia Pan
LimX Dynamics, Shenzhen, China; Department of Computer Science, University of Hong Kong, Hong Kong, China
B
Bo Yang
vLAR Group, The Hong Kong Polytechnic University, Hong Kong SAR, China
H
Hua Chen
LimX Dynamics, Shenzhen, China; ZJU-UIUC Institute, Zhejiang University, Haining, Zhejiang, 314400, China