S$^3$IT: A Benchmark for Spatially Situated Social Intelligence Test

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing evaluation paradigms fail to holistically assess embodied agents’ ability to jointly reason about physical constraints and social norms—social reasoning is often text-based and disembodied, while physical task benchmarks lack social awareness. Method: We introduce S³IT, the first embodied social intelligence benchmark, centered on LLM-driven NPCs performing seat-assignment tasks in 3D environments. It establishes a spatially grounded social reasoning evaluation framework, featuring a scalable procedural scene generation system that supports active preference elicitation via dialogue, autonomous environmental perception, and multi-objective constraint optimization. The benchmark integrates LLMs, embodied navigation, multimodal perception, dialogue modeling, and constraint satisfaction solving. Results: Experiments reveal that state-of-the-art LLMs underperform significantly relative to humans on S³IT, exposing critical deficits in spatial-social collaborative reasoning; however, they achieve near-human performance in explicit textual conflict resolution, highlighting a dissociation between embodied and abstract social reasoning capabilities.

Technology Category

Application Category

📝 Abstract

The integration of embodied agents into human environments demands embodied social intelligence: reasoning over both social norms and physical constraints. However, existing evaluations fail to address this integration, as they are limited to either disembodied social reasoning (e.g., in text) or socially-agnostic physical tasks. Both approaches fail to assess an agent's ability to integrate and trade off both physical and social constraints within a realistic, embodied context. To address this challenge, we introduce Spatially Situated Social Intelligence Test (S$^{3}$IT), a benchmark specifically designed to evaluate embodied social intelligence. It is centered on a novel and challenging seat-ordering task, requiring an agent to arrange seating in a 3D environment for a group of large language model-driven (LLM-driven) NPCs with diverse identities, preferences, and intricate interpersonal relationships. Our procedurally extensible framework generates a vast and diverse scenario space with controllable difficulty, compelling the agent to acquire preferences through active dialogue, perceive the environment via autonomous exploration, and perform multi-objective optimization within a complex constraint network. We evaluate state-of-the-art LLMs on S$^{3}$IT and found that they still struggle with this problem, showing an obvious gap compared with the human baseline. Results imply that LLMs have deficiencies in spatial intelligence, yet simultaneously demonstrate their ability to achieve near human-level competence in resolving conflicts that possess explicit textual cues.

Problem

Research questions and friction points this paper is trying to address.

Evaluates embodied social intelligence in realistic 3D environments

Assesses integration of social norms with physical spatial constraints

Measures multi-objective optimization in complex social and spatial scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for embodied social intelligence testing

Procedurally extensible framework generating diverse scenarios

Seat-ordering task with multi-objective optimization constraints

🔎 Similar Papers

No similar papers found.