Velox: Learning Representations of 4D Geometry and Appearance

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

230K/year
📝 Abstract
We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured dynamic point cloud, to construct. Specifically, Velox trains an encoder to compress spatiotemporal color point clouds into a set of dynamic shape tokens. These tokens are supervised using two complementary decoders: a 4D surface decoder, which models the time-varying surface distribution capturing the geometry; and a Gaussian decoder, which maps the tokens to 3D Gaussians, helping learn appearance. To demonstrate the utility of our representation, we evaluate it across three downstream tasks -- video-to-4D generation, 3D tracking, and cloth simulation via image-to-4D generation -- and observe strong performances in all settings.
Problem

Research questions and friction points this paper is trying to address.

4D representation
dynamic point cloud
geometry and appearance
latent representation
spatiotemporal modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D representation learning
dynamic point clouds
shape tokens
Gaussian splatting
geometry and appearance modeling
🔎 Similar Papers
No similar papers found.