🤖 AI Summary
Dynamic 3D scene modeling requires joint handling of geometric evolution, motion representation, and interactive semantics—yet existing 4D representations lack a unified conceptual framework addressing all three dimensions. Method: We propose the first taxonomy of 4D representations structured around the tripartite pillars of “geometry–motion–interaction.” We systematically survey mainstream approaches—including neural radiance fields, 3D Gaussian splatting, structured neural fields, and video foundation models—analyzing their limitations in generation and reconstruction tasks. We introduce a co-optimization framework for representation selection and task customization, emphasizing long-range motion modeling and multimodal data-driven design. We further investigate the integration potential and inherent boundaries of large language models and video foundation models in 4D understanding. Contribution/Results: Our work delivers a methodological taxonomy, a practical representation selection guide, dataset evaluation benchmarks, and an identification of critical research gaps—providing both theoretical foundations and actionable pathways for 4D generative AI.
📝 Abstract
We present a survey on 4D generation and reconstruction, a fast-evolving subfield of computer graphics whose developments have been propelled by recent advances in neural fields, geometric and motion deep learning, as well 3D generative artificial intelligence (GenAI). While our survey is not the first of its kind, we build our coverage of the domain from a unique and distinctive perspective of 4D representations/}, to model 3D geometry evolving over time while exhibiting motion and interaction. Specifically, instead of offering an exhaustive enumeration of many works, we take a more selective approach by focusing on representative works to highlight both the desirable properties and ensuing challenges of each representation under different computation, application, and data scenarios. The main take-away message we aim to convey to the readers is on how to select and then customize the appropriate 4D representations for their tasks. Organizationally, we separate the 4D representations based on three key pillars: geometry, motion, and interaction. Our discourse will not only encompass the most popular representations of today, such as neural radiance fields (NeRFs) and 3D Gaussian Splatting (3DGS), but also bring attention to relatively under-explored representations in the 4D context, such as structured models and long-range motions. Throughout our survey, we will reprise the role of large language models (LLMs) and video foundational models (VFMs) in a variety of 4D applications, while steering our discussion towards their current limitations and how they can be addressed. We also provide a dedicated coverage on what 4D datasets are currently available, as well as what is lacking, in driving the subfield forward. Project page:https://mingrui-zhao.github.io/4DRep-GMI/