🤖 AI Summary
To address the limitations of unimodal feedback in autonomous driving—where action-based feedback is physically grounded but semantically ambiguous, and language-based feedback is semantically precise yet lacks physical grounding—this paper proposes a multimodal Bayesian reward learning framework. Methodologically, natural language utterances are treated as probabilistic observations of latent user preferences; a large language model extracts semantic attention masks and preference offsets, which are jointly modeled with physical interaction signals (e.g., corrective actions) to enable closed-form online Bayesian posterior updates. The key contribution is the first integration of interpretable linguistic parsing into a statistically principled, closed-loop learning architecture—uniquely bridging semantic abstraction and embodied action. Experiments in a driving simulator demonstrate over 70% reduction in reward modeling error; user studies confirm significantly improved interpretability, collaborative fluency, and behavioral trustworthiness.
📝 Abstract
Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpreted. QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule. This enables fast, real-time, and robust reward learning that handles ambiguous feedback. In a semi-autonomous driving simulator, QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines. A 15-participant user study further validates our approach: participants found QuickLAP significantly more understandable and collaborative, and preferred its learned behavior over baselines. Code is available at https://github.com/MIT-CLEAR-Lab/QuickLAP.