MV-SAM3D: Adaptive Multi-View Fusion for Layout-Aware 3D Generation

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing layout-aware 3D generation methods typically support only single-view inputs, limiting their ability to leverage complementary multi-view information and often producing physically implausible layouts—such as interpenetrating or floating objects—due to independently estimated object poses. To address these limitations, this work proposes a training-free multi-view fusion framework that enforces view consistency in 3D latent space through multiple diffusion processes. It introduces a novel confidence-aware adaptive fusion mechanism based on attention entropy and visibility weighting, and jointly optimizes collision and contact constraints both during and after generation to enhance physical plausibility. Experiments demonstrate that the proposed method significantly improves reconstruction fidelity and layout合理性 on standard benchmarks and real-world multi-object scenes, all without requiring additional training.

Technology Category

Application Category

📝 Abstract
Recent unified 3D generation models have made remarkable progress in producing high-quality 3D assets from a single image. Notably, layout-aware approaches such as SAM3D can reconstruct multiple objects while preserving their spatial arrangement, opening the door to practical scene-level 3D generation. However, current methods are limited to single-view input and cannot leverage complementary multi-view observations, while independently estimated object poses often lead to physically implausible layouts such as interpenetration and floating artifacts. We present MV-SAM3D, a training-free framework that extends layout-aware 3D generation with multi-view consistency and physical plausibility. We formulate multi-view fusion as a Multi-Diffusion process in 3D latent space and propose two adaptive weighting strategies -- attention-entropy weighting and visibility weighting -- that enable confidence-aware fusion, ensuring each viewpoint contributes according to its local observation reliability. For multi-object composition, we introduce physics-aware optimization that injects collision and contact constraints both during and after generation, yielding physically plausible object arrangements. Experiments on standard benchmarks and real-world multi-object scenes demonstrate significant improvements in reconstruction fidelity and layout plausibility, all without any additional training. Code is available at https://github.com/devinli123/MV-SAM3D.
Problem

Research questions and friction points this paper is trying to address.

multi-view 3D generation
layout-aware 3D reconstruction
physical plausibility
object interpenetration
floating artifacts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-View Fusion
Layout-Aware 3D Generation
Multi-Diffusion
Physics-Aware Optimization
Training-Free Framework
🔎 Similar Papers
No similar papers found.
Baicheng Li
Baicheng Li
Peking University
3D VisionEmbodied AI
D
Dong Wu
Peking University
J
Jun Li
JD Explore Academy
S
Shunkai Zhou
Peking University
Z
Zecui Zeng
JD Explore Academy
L
Lusong Li
JD Explore Academy
Hongbin Zha
Hongbin Zha
Peking University
computer visionrobot vision