LoomNet: Enhancing Multi-View Image Generation via Latent Space Weaving

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Single-image multi-view synthesis often suffers from spatial inconsistency, degrading downstream 3D reconstruction quality. To address this, we propose a diffusion-based multi-view generation framework centered on a novel “latent-space weaving mechanism”: orthogonal plane projections align multi-view features, enabling aggregation and interpolation of view-specific encodings within a shared latent space for implicit cross-view scene modeling and collaborative reasoning. Our method enables fast, geometrically consistent novel-view synthesis—generating 16 high-fidelity, geometry-aligned views in just 15 seconds. Quantitatively, it surpasses state-of-the-art methods across image fidelity metrics (FID, LPIPS) and 3D reconstruction benchmarks (Chamfer distance, mIoU). Notably, it significantly improves single-image-driven neural radiance field (NeRF) and mesh reconstruction performance, demonstrating superior implicit 3D consistency and generalization.

Technology Category

Application Category

📝 Abstract

Generating consistent multi-view images from a single image remains challenging. Lack of spatial consistency often degrades 3D mesh quality in surface reconstruction. To address this, we propose LoomNet, a novel multi-view diffusion architecture that produces coherent images by applying the same diffusion model multiple times in parallel to collaboratively build and leverage a shared latent space for view consistency. Each viewpoint-specific inference generates an encoding representing its own hypothesis of the novel view from a given camera pose, which is projected onto three orthogonal planes. For each plane, encodings from all views are fused into a single aggregated plane. These aggregated planes are then processed to propagate information and interpolate missing regions, combining the hypotheses into a unified, coherent interpretation. The final latent space is then used to render consistent multi-view images. LoomNet generates 16 high-quality and coherent views in just 15 seconds. In our experiments, LoomNet outperforms state-of-the-art methods on both image quality and reconstruction metrics, also showing creativity by producing diverse, plausible novel views from the same input.

Problem

Research questions and friction points this paper is trying to address.

Generating consistent multi-view images from single image

Improving 3D mesh quality via spatial consistency

Enhancing view coherence in multi-view diffusion architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel multi-view diffusion for shared latent space

Orthogonal plane projection and fusion for consistency

Fast high-quality 16-view generation in 15 seconds

🔎 Similar Papers

No similar papers found.