From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos

📅 2025-07-27

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address temporal inconsistency and illumination distortion in 3D bracelet insertion into dynamic videos, this paper introduces the first video-level insertion framework integrating 3D Gaussian Splatting (3DGS) rendering with 2D diffusion models. Methodologically, we propose a shading-decoupled rendering pipeline: multi-frame weighted optimization of 3DGS reconstructs geometry and albedo priors; subsequently, a diffusion model jointly refines albedo, shading, and sRGB outputs, enabling photon-accurate illumination modeling and spatiotemporal coherence. Our approach significantly enhances visual realism and inter-frame stability of inserted bracelets under challenging conditions—including complex wrist motion, large viewpoint variations, and non-stationary lighting—outperforming existing state-of-the-art methods. The framework is directly applicable to real-time AR interaction and virtual try-on systems.

Technology Category

Application Category

📝 Abstract

Inserting 3D objects into videos is a longstanding challenge in computer graphics with applications in augmented reality, virtual try-on, and video composition. Achieving both temporal consistency, or realistic lighting remains difficult, particularly in dynamic scenarios with complex object motion, perspective changes, and varying illumination. While 2D diffusion models have shown promise for producing photorealistic edits, they often struggle with maintaining temporal coherence across frames. Conversely, traditional 3D rendering methods excel in spatial and temporal consistency but fall short in achieving photorealistic lighting. In this work, we propose a hybrid object insertion pipeline that combines the strengths of both paradigms. Specifically, we focus on inserting bracelets into dynamic wrist scenes, leveraging the high temporal consistency of 3D Gaussian Splatting (3DGS) for initial rendering and refining the results using a 2D diffusion-based enhancement model to ensure realistic lighting interactions. Our method introduces a shading-driven pipeline that separates intrinsic object properties (albedo, shading, reflectance) and refines both shading and sRGB images for photorealism. To maintain temporal coherence, we optimize the 3DGS model with multi-frame weighted adjustments. This is the first approach to synergize 3D rendering and 2D diffusion for video object insertion, offering a robust solution for realistic and consistent video editing. Project Page: https://cjeen.github.io/BraceletPaper/

Problem

Research questions and friction points this paper is trying to address.

Achieving temporal consistency in 3D object insertion in videos

Combining 3D rendering and 2D diffusion for realistic lighting

Maintaining photorealism in dynamic scenes with complex motion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid pipeline combining 3DGS and 2D diffusion

Shading-driven refinement for photorealistic lighting

Multi-frame weighted 3DGS optimization for coherence

🔎 Similar Papers

Physically Compatible 3D Object Modeling from a Single Image