Knowledge-Guided Manipulation Using Multi-Task Reinforcement Learning

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of multitask robotic manipulation in partially observable environments, where perception is hindered by ambiguity, occlusions, distractors, and dynamic scene layouts. To tackle these issues, the authors propose KG-M3PO, a framework that jointly optimizes perception, knowledge, and policy modules through end-to-end reinforcement learning. KG-M3PO embeds structured world knowledge into the policy network by constructing an open-vocabulary 3D scene graph online, enabling semantically guided and robust decision-making. The approach integrates visual, proprioceptive, linguistic, and dynamic graph information, employing lightweight graph queries to produce compact semantic states and leveraging graph neural networks with dynamic relational mechanisms to enhance representation capacity. Experiments demonstrate that KG-M3PO significantly improves task success rates, sample efficiency, and generalization to novel objects and unseen scenes in complex manipulation tasks.

Technology Category

Application Category

📝 Abstract
This paper introduces Knowledge Graph based Massively Multi-task Model-based Policy Optimization (KG-M3PO), a framework for multi-task robotic manipulation in partially observable settings that unifies Perception, Knowledge, and Policy. The method augments egocentric vision with an online 3D scene graph that grounds open-vocabulary detections into a metric, relational representation. A dynamic-relation mechanism updates spatial, containment, and affordance edges at every step, and a graph neural encoder is trained end-to-end through the RL objective so that relational features are shaped directly by control performance. Multiple observation modalities (visual, proprioceptive, linguistic, and graph-based) are encoded into a shared latent space, upon which the RL agent operates to drive the control loop. The policy conditions on lightweight graph queries alongside visual and proprioceptive inputs, yielding a compact, semantically informed state for decision making. Experiments on a suite of manipulation tasks with occlusions, distractors, and layout shifts demonstrate consistent gains over strong baselines: the knowledge-conditioned agent achieves higher success rates, improved sample efficiency, and stronger generalization to novel objects and unseen scene configurations. These results support the premise that structured, continuously maintained world knowledge is a powerful inductive bias for scalable, generalizable manipulation: when the knowledge module participates in the RL computation graph, relational representations align with control, enabling robust long-horizon behavior under partial observability.
Problem

Research questions and friction points this paper is trying to address.

partially observable environments
multi-task robotic manipulation
occlusions
generalization
scene understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge graph
multi-task reinforcement learning
3D scene graph
dynamic relational reasoning
partially observable manipulation
A
Aditya Narendra
MIRAI, Moscow, Russia and MBZUAI, Abu Dhabi, UAE
M
Mukhammadrizo Maribjonov
MIRAI, Moscow, Russia and Innopolis University, Russia
Dmitry Makarov
Dmitry Makarov
Special Astrophysical Observatory of the Russian Academy of Sciences
AstrophysicsDark MatterDistance Scale
D
Dmitry Yudin
MIRAI and AXXX, Moscow, Russia
A
Aleksandr Panov
MIRAI and AXXX, Moscow, Russia