🤖 AI Summary
Existing 3D human motion synthesis methods primarily focus on geometric constraints while lacking deep semantic understanding of surrounding scenes. To address this, we propose a semantics-aware motion synthesis framework. Our method introduces (1) a unified Scene Semantic Occupancy (SSO) representation that jointly encodes CLIP-derived semantic features and spatial occupancy via shared linear dimensionality reduction; and (2) a bidirectional tri-plane decomposition architecture with frame-level scene queries, enabling instruction-driven, fine-grained motion generation conditioned on scene semantics. Extensive experiments on cluttered real-world datasets—including ShapeNet, PROX, and Replica—demonstrate significant improvements over state-of-the-art approaches in semantic fidelity, computational efficiency, and cross-scene generalization. Ablation studies further validate the effectiveness of both SSO and the query mechanism in capturing scene-aware motion priors.
📝 Abstract
Human motion synthesis in 3D scenes relies heavily on scene comprehension, while current methods focus mainly on scene structure but ignore the semantic understanding. In this paper, we propose a human motion synthesis framework that take an unified Scene Semantic Occupancy (SSO) for scene representation, termed SSOMotion. We design a bi-directional tri-plane decomposition to derive a compact version of the SSO, and scene semantics are mapped to an unified feature space via CLIP encoding and shared linear dimensionality reduction. Such strategy can derive the fine-grained scene semantic structures while significantly reduce redundant computations. We further take these scene hints and movement direction derived from instructions for motion control via frame-wise scene query. Extensive experiments and ablation studies conducted on cluttered scenes using ShapeNet furniture, as well as scanned scenes from PROX and Replica datasets, demonstrate its cutting-edge performance while validating its effectiveness and generalization ability. Code will be publicly available at https://github.com/jingyugong/SSOMotion.