Articulat3D: Reconstructing Articulated Digital Twins From Monocular Videos with Geometric and Motion Constraints

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing approaches rely on multi-view static captures and struggle to reconstruct high-fidelity digital twins with articulated parts from monocular video. This work proposes a motion-prior-driven joint optimization framework that first models low-dimensional articulated motion using 3D point trajectories to initialize the underlying structure, then introduces learnable kinematic primitives—comprising joint axes, pivots, and per-frame motion scalars—to enforce geometric and motion consistency constraints. This enables temporally coherent, high-accuracy reconstruction of articulated digital twins. The method achieves, for the first time, high-fidelity reconstruction of articulated digital twins under uncontrolled conditions, demonstrating state-of-the-art performance on both synthetic and real-world monocular videos and significantly advancing the feasibility and accuracy of digital twin creation in unconstrained scenarios.

Technology Category

Application Category

📝 Abstract

Building high-fidelity digital twins of articulated objects from visual data remains a central challenge. Existing approaches depend on multi-view captures of the object in discrete, static states, which severely constrains their real-world scalability. In this paper, we introduce Articulat3D, a novel framework that constructs such digital twins from casually captured monocular videos by jointly enforcing explicit 3D geometric and motion constraints. We first propose Motion Prior-Driven Initialization, which leverages 3D point tracks to exploit the low-dimensional structure of articulated motion. By modeling scene dynamics with a compact set of motion bases, we facilitate soft decomposition of the scene into multiple rigidly-moving groups. Building on this initialization, we introduce Geometric and Motion Constraints Refinement, which enforces physically plausible articulation through learnable kinematic primitives parameterized by a joint axis, a pivot point, and per-frame motion scalars, yielding reconstructions that are both geometrically accurate and temporally coherent. Extensive experiments demonstrate that Articulat3D achieves state-of-the-art performance on synthetic benchmarks and real-world casually captured monocular videos, significantly advancing the feasibility of digital twin creation under uncontrolled real-world conditions. Our project page is at https://maxwell-zhao.github.io/Articulat3D.

Problem

Research questions and friction points this paper is trying to address.

digital twins

articulated objects

monocular video

3D reconstruction

motion constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

articulated digital twins

monocular video reconstruction

motion prior