The Robot's Inner Critic: Self-Refinement of Social Behaviors through VLM-based Replanning

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

Existing approaches to generating social behaviors for robots rely heavily on predefined actions or human feedback, limiting their flexibility and autonomy. This work proposes CRISP, a novel framework that introduces a vision-language model (VLM) as an introspective “social critic,” enabling robots to autonomously evaluate and iteratively refine their behaviors. Requiring only the robot’s MJCF structural file—and no task-specific APIs or human intervention—CRISP generates natural, contextually appropriate motion sequences across diverse robotic platforms. By integrating context-driven planning, joint-motion visualization, and reward-based iterative search, CRISP significantly outperforms existing methods in user studies involving five robot types and twenty real-world scenarios, achieving higher scores in both behavioral preference and contextual appropriateness.

Technology Category

Application Category

📝 Abstract

Conventional robot social behavior generation has been limited in flexibility and autonomy, relying on predefined motions or human feedback. This study proposes CRISP (Critique-and-Replan for Interactive Social Presence), an autonomous framework where a robot critiques and replans its own actions by leveraging a Vision-Language Model (VLM) as a `human-like social critic.' CRISP integrates (1) extraction of movable joints and constraints by analyzing the robot's description file (e.g., MJCF), (2) generation of step-by-step behavior plans based on situational context, (3) generation of low-level joint control code by referencing visual information (joint range-of-motion visualizations), (4) VLM-based evaluation of social appropriateness and naturalness, including pinpointing erroneous steps, and (5) iterative refinement of behaviors through reward-based search. This approach is not tied to a specific robot API; it can generate subtly different, human-like motions on various platforms using only the robot's structure file. In a user study involving five different robot types and 20 scenarios, including mobile manipulators and humanoids, our proposed method achieved significantly higher preference and situational appropriateness ratings compared to previous methods. This research presents a general framework that minimizes human intervention while expanding the robot's autonomous interaction capabilities and cross-platform applicability. Detailed result videos and supplementary information regarding this work are available at: https://limjiyu99.github.io/inner-critic/

Problem

Research questions and friction points this paper is trying to address.

social behavior generation

robot autonomy

human-robot interaction

flexible motion planning

cross-platform applicability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Model

Self-Refinement

Social Robot Behavior