🤖 AI Summary
This work addresses the challenge of evaluating social perception and collaborative capabilities of embodied agents in scenarios where human physical constraints—such as wheelchair use or inability to reach elevated objects—impair interaction. To this end, we introduce CHAIC, the first embodied human–AI collaboration benchmark explicitly designed for physically constrained users. Methodologically, we construct a first-person, multimodal environment integrating social perception modeling and cooperative planning, leveraging large language models, behavioral modeling, planning algorithms, and reinforcement learning to enable real-time inference of human intent and constraints, as well as dynamic collaborative decision-making. Our contributions are threefold: (1) an open-source benchmark featuring realistic constrained agents and emergency-risk scenarios; (2) a novel evaluation paradigm unifying social understanding with safety-aware collaboration; and (3) significant improvements in both efficiency and safety across eight long-horizon indoor and outdoor tasks, empirically validating our approach.
📝 Abstract
We introduce Constrained Human-AI Cooperation (CHAIC), an inclusive embodied social intelligence challenge designed to test social perception and cooperation in embodied agents. In CHAIC, the goal is for an embodied agent equipped with egocentric observations to assist a human who may be operating under physical constraints -- e.g., unable to reach high places or confined to a wheelchair -- in performing common household or outdoor tasks as efficiently as possible. To achieve this, a successful helper must: (1) infer the human's intents and constraints by following the human and observing their behaviors (social perception), and (2) make a cooperative plan tailored to the human partner to solve the task as quickly as possible, working together as a team (cooperative planning). To benchmark this challenge, we create four new agents with real physical constraints and eight long-horizon tasks featuring both indoor and outdoor scenes with various constraints, emergency events, and potential risks. We benchmark planning- and learning-based baselines on the challenge and introduce a new method that leverages large language models and behavior modeling. Empirical evaluations demonstrate the effectiveness of our benchmark in enabling systematic assessment of key aspects of machine social intelligence. Our benchmark and code are publicly available at https://github.com/UMass-Foundation-Model/CHAIC.