CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

219K/year
🤖 AI Summary
This work addresses the challenge that large language models (LLMs) lack physical embodiment and thus struggle to directly enable high-contact robotic manipulation. To bridge this gap, the authors propose the CoRAL framework, which uniquely decouples the LLM into a semantic cost function designer and integrates it with a vision-language model (VLM), a sampling-based motion planner (MPPI), and a neuro-symbolic adaptive loop. This architecture enables zero-shot task planning and real-time parameter adaptation by coupling semantic reasoning with low-level control through a retrieval-augmented memory mechanism. Evaluated in both simulation and real-world settings, CoRAL substantially outperforms existing approaches, achieving over a 50% average improvement in success rate on unseen high-contact tasks and effectively narrowing the sim-to-real gap.
📝 Abstract
While Large Language Models (LLMs) and Vision-Language Models (VLMs) demonstrate remarkable capabilities in high-level reasoning and semantic understanding, applying them directly to contact-rich manipulation remains a challenge due to their lack of explicit physical grounding and inability to perform adaptive control. To bridge this gap, we propose CoRAL (Contact-Rich Adaptive LLM-based control), a modular framework that enables zero-shot planning by decoupling high-level reasoning from low-level control. Unlike black-box policies, CoRAL uses LLMs not as direct controllers, but as cost designers that synthesize context-aware objective functions for a sampling-based motion planner (MPPI). To address the ambiguity of physical parameters in visual data, we introduce a neuro-symbolic adaptation loop: a VLM provides semantic priors for environmental dynamics, such as mass and friction estimates, which are then explicitly refined in real time via online system identification, while the LLM iteratively modulates the cost-function structure to correct strategic errors based on interaction feedback. Furthermore, a retrieval-based memory unit allows the system to reuse successful strategies across recurrent tasks. This hierarchical architecture ensures real-time control stability by decoupling high-level semantic reasoning from reactive execution, effectively bridging the gap between slow LLM inference and dynamic contact requirements. We validate CoRAL on both simulation and real-world hardware across challenging and novel tasks, such as flipping objects against walls by leveraging extrinsic contacts. Experiments demonstrate that CoRAL outperforms state-of-the-art VLA and foundation-model-based planner baselines by boosting success rates over 50% on average in unseen contact-rich scenarios, effectively handling sim-to-real gaps through its adaptive physical understanding.
Problem

Research questions and friction points this paper is trying to address.

contact-rich manipulation
Large Language Models
adaptive control
physical grounding
robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based control
contact-rich manipulation
neuro-symbolic adaptation
online system identification
modular hierarchical architecture