4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of low computational efficiency and sensitivity to missing observations in 4D reconstruction of equine subjects from monocular videos by proposing the first efficient framework that decouples dynamic motion from static appearance. The method employs a spatio-temporal Transformer to regress smooth pose sequences and integrates a feed-forward network to generate high-fidelity, animatable 3D Gaussian avatars from single images. Trained exclusively on synthetic data, the approach leverages two newly curated high-quality datasets—VarenPoser for poses and VarenTex for textures—and incorporates multi-view diffusion generation with post-optimization strategies. Evaluated on the real-world APT36K and AiM benchmarks, the method achieves state-of-the-art performance, significantly improving geometric consistency, motion smoothness, and pixel-level appearance fidelity.

Technology Category

Application Category

📝 Abstract
4D reconstruction of equine family (e.g. horses) from monocular video is important for animal welfare. Previous mainstream 4D animal reconstruction methods require joint optimization of motion and appearance over a whole video, which is time-consuming and sensitive to incomplete observation. In this work, we propose a novel framework called 4DEquine by disentangling the 4D reconstruction problem into two sub-problems: dynamic motion reconstruction and static appearance reconstruction. For motion, we introduce a simple yet effective spatio-temporal transformer with a post-optimization stage to regress smooth and pixel-aligned pose and shape sequences from video. For appearance, we design a novel feed-forward network that reconstructs a high-fidelity, animatable 3D Gaussian avatar from as few as a single image. To assist training, we create a large-scale synthetic motion dataset, VarenPoser, which features high-quality surface motions and diverse camera trajectories, as well as a synthetic appearance dataset, VarenTex, comprising realistic multi-view images generated through multi-view diffusion. While training only on synthetic datasets, 4DEquine achieves state-of-the-art performance on real-world APT36K and AiM datasets, demonstrating the superiority of 4DEquine and our new datasets for both geometry and appearance reconstruction. Comprehensive ablation studies validate the effectiveness of both the motion and appearance reconstruction network. Project page: https://luoxue-star.github.io/4DEquine_Project_Page/.
Problem

Research questions and friction points this paper is trying to address.

4D reconstruction
equine
monocular video
motion-appearance disentanglement
animal modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D reconstruction
motion-appearance disentanglement
spatio-temporal transformer
3D Gaussian avatar
synthetic dataset
🔎 Similar Papers
No similar papers found.
J
Jin Lyu
Southern University of Science and Technology
Liang An
Liang An
Tsinghua University
3D visionhuman motion captureanimal motion capture
P
Pujin Cheng
Southern University of Science and Technology, The University of Hong Kong
Yebin Liu
Yebin Liu
Professor, Tsinghua University
Computer GraphicsComputational Photography3D VisionDigital Humans
X
Xiaoying Tang
Southern University of Science and Technology