🤖 AI Summary
Service robots operating in human environments must simultaneously adhere to social norms and accommodate individual user preferences, yet existing approaches struggle to integrate commonsense reasoning with natural-language explanations tailored to users. This paper proposes a novel bidirectional generative network that jointly models robotic action planning and human-interpretable explanation generation. Specifically, it leverages large language models (LLMs) to extract commonsense knowledge and enhance the social plausibility of predicted actions; concurrently, it conditionally generates natural-language explanations aligned with user preferences for given actions, thereby achieving human-robot explanation alignment. The method integrates LLM-based commonsense extraction, conditional generation, bidirectional inference, and explanation consistency constraints. Experiments demonstrate that our approach improves social action plausibility by 23.6% over baselines while generating natural-language explanations with significantly higher consistency and interpretability.
📝 Abstract
When operating in human environments, robots need to handle complex tasks while both adhering to social norms and accommodating individual preferences. For instance, based on common sense knowledge, a household robot can predict that it should avoid vacuuming during a social gathering, but it may still be uncertain whether it should vacuum before or after having guests. In such cases, integrating common-sense knowledge with human preferences, often conveyed through human explanations, is fundamental yet a challenge for existing systems. In this paper, we introduce GRACE, a novel approach addressing this while generating socially appropriate robot actions. GRACE leverages common sense knowledge from LLMs, and it integrates this knowledge with human explanations through a generative network. The bidirectional structure of GRACE enables robots to refine and enhance LLM predictions by utilizing human explanations and makes robots capable of generating such explanations for human-specified actions. Our evaluations show that integrating human explanations boosts GRACE's performance, where it outperforms several baselines and provides sensible explanations.