Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts

📅 2025-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge that large language models (LLMs) generate semantic commonsense knowledge difficult to ground physically and apply to robotic manipulation of generalized articulated objects (e.g., doors, drawers, cabinets), this paper introduces “analytic concepts”—a computable, simulation-compatible, and physically interpretable intermediate representation bridging semantic knowledge and the physical world via joint structural-functional-dynamic modeling. Our method integrates LLM-based commonsense reasoning, symbolic mathematical modeling, physics-based simulation, and closed-loop control policy learning. Evaluated in both simulation and real-world settings, it significantly improves task success rate, cross-object generalization, pose estimation accuracy, and behavioral interpretability. The core contribution is the first principled framework for executable, semantics-physically aligned commonsense reasoning—establishing a novel paradigm for general-purpose, human-like robotic manipulation.

Technology Category

Application Category

📝 Abstract
We human rely on a wide range of commonsense knowledge to interact with an extensive number and categories of objects in the physical world. Likewise, such commonsense knowledge is also crucial for robots to successfully develop generalized object manipulation skills. While recent advancements in Large Language Models (LLM) have showcased their impressive capabilities in acquiring commonsense knowledge and conducting commonsense reasoning, effectively grounding this semantic-level knowledge produced by LLMs to the physical world to thoroughly guide robots in generalized articulated object manipulation remains a challenge that has not been sufficiently addressed. To this end, we introduce analytic concepts, procedurally defined upon mathematical symbolism that can be directly computed and simulated by machines. By leveraging the analytic concepts as a bridge between the semantic-level knowledge inferred by LLMs and the physical world where real robots operate, we are able to figure out the knowledge of object structure and functionality with physics-informed representations, and then use the physically grounded knowledge to instruct robot control policies for generalized, interpretable and accurate articulated object manipulation. Extensive experiments in both simulation and real-world environments demonstrate the superiority of our approach.
Problem

Research questions and friction points this paper is trying to address.

Grounding LLM-derived commonsense knowledge for robot manipulation
Bridging semantic knowledge with physics-informed object representations
Enabling generalized interpretable control for articulated object manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging analytic concepts for physical grounding
Bridging LLM knowledge with robot control
Physics-informed representations for object manipulation
🔎 Similar Papers
No similar papers found.
Jianhua Sun
Jianhua Sun
Shanghai Jiao Tong University
Computer VisionRobot Learning
J
Jiude Wei
Shanghai Jiao Tong University
Y
Yuxuan Li
Shanghai Jiao Tong University
C
Cewu Lu
Shanghai Jiao Tong University