DMAligner: Enhancing Image Alignment via Diffusion Model Based View Synthesis

📅 2026-02-26

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the significant degradation of traditional optical flow methods under occlusion and illumination changes, which severely limits the accuracy of downstream tasks. To overcome this limitation, we propose a novel view synthesis paradigm for image alignment by introducing diffusion models. Our approach employs a Dynamic-aware Mask Prediction (DMP) module to distinguish foreground from background and leverages a conditional diffusion model to generate high-quality aligned images. To facilitate training and evaluation, we construct DSIA, a large-scale dynamic scene alignment dataset comprising over 30K synthetic image pairs rendered with Blender. Extensive experiments demonstrate that our method substantially outperforms existing approaches on both DSIA and multiple mainstream video benchmarks, achieving superior visual alignment quality and consistently enhancing performance in downstream tasks.

Technology Category

Application Category

📝 Abstract

Image alignment is a fundamental task in computer vision with broad applications. Existing methods predominantly employ optical flow-based image warping. However, this technique is susceptible to common challenges such as occlusions and illumination variations, leading to degraded alignment visual quality and compromised accuracy in downstream tasks. In this paper, we present DMAligner, a diffusion-based framework for image alignment through alignment-oriented view synthesis. DMAligner is crafted to tackle the challenges in image alignment from a new perspective, employing a generation-based solution that showcases strong capabilities and avoids the problems associated with flow-based image warping. Specifically, we propose a Dynamics-aware Diffusion Training approach for learning conditional image generation, synthesizing a novel view for image alignment. This incorporates a Dynamics-aware Mask Producing (DMP) module to adaptively distinguish dynamic foreground regions from static backgrounds, enabling the diffusion model to more effectively handle challenges that classical methods struggle to solve. Furthermore, we develop the Dynamic Scene Image Alignment (DSIA) dataset using Blender, which includes 1,033 indoor and outdoor scenes with over 30K image pairs tailored for image alignment. Extensive experimental results demonstrate the superiority of the proposed approach on DSIA benchmarks, as well as on a series of widely-used video datasets for qualitative comparisons. Our code is available at https://github.com/boomluo02/DMAligner.

Problem

Research questions and friction points this paper is trying to address.

image alignment

occlusions

illumination variations

optical flow

dynamic scenes

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model

view synthesis

image alignment