RealMaster: Lifting Rendered Scenes into Photorealistic Video

๐Ÿ“… 2026-03-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing video generation methods struggle to simultaneously achieve 3D consistency and visual photorealism, while conventional 3D rendering, though geometrically accurate, often lacks fine-grained realistic details. This work proposes a video diffusion-based framework that enforces structural alignment through geometric conditioning and introduces an IC-LoRA fine-tuning mechanism coupled with an anchor-frame propagation strategy to construct paired training data. This approach enables photorealistic enhancement for both anchor-free sequences and dynamic objects. Evaluated on complex GTA-V scenes, the method significantly outperforms current video editing techniques by substantially improving global material quality, lighting, and textural realismโ€”while preserving the original geometry, motion dynamics, and identity consistency.

Technology Category

Application Category

๐Ÿ“ Abstract
State-of-the-art video generation models produce remarkable photorealism, but they lack the precise control required to align generated content with specific scene requirements. Furthermore, without an underlying explicit geometry, these models cannot guarantee 3D consistency. Conversely, 3D engines offer granular control over every scene element and provide native 3D consistency by design, yet their output often remains trapped in the "uncanny valley". Bridging this sim-to-real gap requires both structural precision, where the output must exactly preserve the geometry and dynamics of the input, and global semantic transformation, where materials, lighting, and textures must be holistically transformed to achieve photorealism. We present RealMaster, a method that leverages video diffusion models to lift rendered video into photorealistic video while maintaining full alignment with the output of the 3D engine. To train this model, we generate a paired dataset via an anchor-based propagation strategy, where the first and last frames are enhanced for realism and propagated across the intermediate frames using geometric conditioning cues. We then train an IC-LoRA on these paired videos to distill the high-quality outputs of the pipeline into a model that generalizes beyond the pipeline's constraints, handling objects and characters that appear mid-sequence and enabling inference without requiring anchor frames. Evaluated on complex GTA-V sequences, RealMaster significantly outperforms existing video editing baselines, improving photorealism while preserving the geometry, dynamics, and identity specified by the original 3D control.
Problem

Research questions and friction points this paper is trying to address.

photorealism
3D consistency
sim-to-real gap
video generation
rendered scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

video diffusion models
3D consistency
photorealistic video synthesis
IC-LoRA
anchor-based propagation
๐Ÿ”Ž Similar Papers
No similar papers found.