Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D activity reasoning and planning methods suffer from two key limitations: (1) poor understanding of users’ implicit intentions, and (2) neglect of coherent, multi-step path planning. This paper introduces a novel implicit-instruction-driven paradigm for 3D activity reasoning, jointly modeling intention understanding, step decomposition, and cross-step path planning to enable semantic-geometric co-decision. Our contributions are threefold: (1) We introduce ReasonPlan3D—the first benchmark supporting multi-step tasks, inter-step path planning, and fine-grained object segmentation; (2) We propose a progressive plan generation framework with dynamic scene graph updating; (3) We integrate multimodal learning, 3D segmentation, spatial relation reasoning, and sequential planning. Evaluated on ReasonPlan3D, our method significantly improves implicit intention recognition accuracy and path plausibility, and—crucially—achieves, for the first time, end-to-end joint generation of task steps and navigation trajectories.

Technology Category

Application Category

📝 Abstract
3D activity reasoning and planning has attracted increasing attention in human-robot interaction and embodied AI thanks to the recent advance in multimodal learning. However, most existing works share two constraints: 1) heavy reliance on explicit instructions with little reasoning on implicit user intention; 2) negligence of inter-step route planning on robot moves. To bridge the gaps, we propose 3D activity reasoning and planning, a novel 3D task that reasons the intended activities from implicit instructions and decomposes them into steps with inter-step routes and planning under the guidance of fine-grained 3D object shapes and locations from scene segmentation. We tackle the new 3D task from two perspectives. First, we construct ReasonPlan3D, a large-scale benchmark that covers diverse 3D scenes with rich implicit instructions and detailed annotations for multi-step task planning, inter-step route planning, and fine-grained segmentation. Second, we design a novel framework that introduces progressive plan generation with contextual consistency across multiple steps, as well as a scene graph that is updated dynamically for capturing critical objects and their spatial relations. Extensive experiments demonstrate the effectiveness of our benchmark and framework in reasoning activities from implicit human instructions, producing accurate stepwise task plans, and seamlessly integrating route planning for multi-step moves. The dataset and code will be released.
Problem

Research questions and friction points this paper is trying to address.

Reasoning implicit human intentions in 3D activity planning
Integrating inter-step route planning for robot moves
Developing a benchmark and framework for 3D task reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

ReasonPlan3D benchmark for 3D activity reasoning
Progressive plan generation with contextual consistency
Dynamic scene graph for object and spatial relations
🔎 Similar Papers
No similar papers found.
Xueying Jiang
Xueying Jiang
Nanyang Technological University
Computer Vision
W
Wenhao Li
College of Computing and Data Science, Nanyang Technological University, Singapore
X
Xiaoqin Zhang
College of Computer Science and Technology, Zhejiang University of Technology, China
L
Ling Shao
UCAS-Terminus AI Lab, University of Chinese Academy of Sciences, China
Shijian Lu
Shijian Lu
College of Computing and Data Science, NTU
Image and video analyticscomputer visionmachine learning