Social 3D Scene Graphs: Modeling Human Actions and Relations for Interactive Service Robots

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Service robots lack social situational understanding, hindering their ability to interpret human attributes, activities, and interpersonal or human-object interactions in 3D environments. Method: We propose Social 3D Scene Graphs (S3SG), an enhanced 3D scene graph representation supporting open-vocabulary semantics and multi-scale relational modeling. S3SG is the first to jointly encode human attributes, activities, and both local and long-range human-human and human-object interactions within a unified 3D spatial framework. It integrates multi-frame temporal information and synthetic data generation to construct SocialScene3D—the first large-scale synthetic benchmark for complex social reasoning, featuring fine-grained behavioral and relational annotations. Contribution/Results: Experiments demonstrate that S3SG significantly improves human activity prediction and social relation inference, achieving state-of-the-art performance on multiple open-vocabulary social query tasks. This work establishes a scalable, structured cognitive foundation for socially intelligent robots.

Technology Category

Application Category

📝 Abstract

Understanding how people interact with their surroundings and each other is essential for enabling robots to act in socially compliant and context-aware ways. While 3D Scene Graphs have emerged as a powerful semantic representation for scene understanding, existing approaches largely ignore humans in the scene, also due to the lack of annotated human-environment relationships. Moreover, existing methods typically capture only open-vocabulary relations from single image frames, which limits their ability to model long-range interactions beyond the observed content. We introduce Social 3D Scene Graphs, an augmented 3D Scene Graph representation that captures humans, their attributes, activities and relationships in the environment, both local and remote, using an open-vocabulary framework. Furthermore, we introduce a new benchmark consisting of synthetic environments with comprehensive human-scene relationship annotations and diverse types of queries for evaluating social scene understanding in 3D. The experiments demonstrate that our representation improves human activity prediction and reasoning about human-environment relations, paving the way toward socially intelligent robots.

Problem

Research questions and friction points this paper is trying to address.

Modeling human actions and relations for interactive service robots

Capturing long-range human-environment interactions beyond single frames

Addressing lack of annotated human-scene relationship data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augmented 3D Scene Graphs modeling human attributes and relationships

Open-vocabulary framework capturing local and remote human interactions

Synthetic benchmark with comprehensive human-scene relationship annotations

🔎 Similar Papers

No similar papers found.