3D Object Manipulation in a Single Image using Generative Models

๐Ÿ“… 2025-01-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of real-time, photorealistic 3D object editing and animation generation from a single input image. We propose OMG3D, an end-to-end framework integrating neural radiance field (NeRF) reconstruction, diffusion-based texture refinement via a CustomRefiner module, and learnable illumination modeling through an IllumiCombiner moduleโ€”enabling geometry-controllable editing, photorealistic texture enhancement, and illumination-consistent compositing. Our key contribution is the first unified training paradigm jointly optimizing texture refinement and illumination estimation, supporting both static edits and dynamic motion generation. Given only one image, OMG3D achieves high-fidelity 3D reconstruction and animation synthesis on a single RTX 3090 GPU. Quantitative and qualitative evaluations demonstrate significant improvements over prior methods in appearance consistency, shadow realism, and motion naturalness, establishing new state-of-the-art performance.

Technology Category

Application Category

๐Ÿ“ Abstract
Object manipulation in images aims to not only edit the object's presentation but also gift objects with motion. Previous methods encountered challenges in concurrently handling static editing and dynamic generation, while also struggling to achieve fidelity in object appearance and scene lighting. In this work, we introduce extbf{OMG3D}, a novel framework that integrates the precise geometric control with the generative power of diffusion models, thus achieving significant enhancements in visual performance. Our framework first converts 2D objects into 3D, enabling user-directed modifications and lifelike motions at the geometric level. To address texture realism, we propose CustomRefiner, a texture refinement module that pre-train a customized diffusion model, aligning the details and style of coarse renderings of 3D rough model with the original image, further refine the texture. Additionally, we introduce IllumiCombiner, a lighting processing module that estimates and corrects background lighting to match human visual perception, resulting in more realistic shadow effects. Extensive experiments demonstrate the outstanding visual performance of our approach in both static and dynamic scenarios. Remarkably, all these steps can be done using one NVIDIA 3090. Project page is at https://whalesong-zrs.github.io/OMG3D-projectpage/
Problem

Research questions and friction points this paper is trying to address.

Real-time 3D Object Manipulation
Photorealistic Rendering
Dynamic Presentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

OMG3D Model
Diffusion Model and CustomRefiner
IllumiCombiner Technology
๐Ÿ”Ž Similar Papers
No similar papers found.
R
Ruisi Zhao
ReLER, CCAI, Zhejiang University
Zechuan Zhang
Zechuan Zhang
PhD student in Zhejiang University
3D VisionImage GenerationAI4Sci
Z
Zongxin Yang
DBMI, HMS, Harvard University
Y
Yi Yang
ReLER, CCAI, Zhejiang University