Hierarchical Attention Diffusion Networks with Object Priors for Video Change Detection

📅 2024-08-20
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of multi-class discrimination, interpretability, and perceptual consistency in remote sensing video change detection. We propose the first end-to-end framework integrating instance-level prior guidance, hierarchical cross-attention diffusion modeling, and pixel-wise multi-class semantic classification. Methodologically: (1) Mask R-CNN extracts temporal instance masks of newly emerged objects as structural priors; (2) a hierarchical cross-attention mechanism guides the denoising process of a Denoising Diffusion Probabilistic Model (DDPM), jointly capturing local object details and global contextual dependencies; (3) an SSIM-based loss is introduced to explicitly enforce perceptual consistency in generated change maps. Evaluated on both synthetic and real-world remote sensing video datasets, our method achieves F1 and IoU improvements of 10–25 percentage points over state-of-the-art baselines—including discriminative methods, Siamese CNNs, and GAN-based approaches—establishing new SOTA performance for multi-class video change detection.

Technology Category

Application Category

📝 Abstract
We present a unified change detection pipeline that combines instance level masking, multi-scale attention within a denoising diffusion model, and per pixel semantic classification, all refined via SSIM to match human perception. By first isolating only temporally novel objects with Mask R-CNN, then guiding diffusion updates through hierarchical cross attention to object and global contexts, and finally categorizing each pixel into one of C change types, our method delivers detailed, interpretable multi-class maps. It outperforms traditional differencing, Siamese CNNs, and GAN-based detectors by 10-25 points in F1 and IoU on both synthetic and real world benchmarks, marking a new state of the art in remote sensing change detection.
Problem

Research questions and friction points this paper is trying to address.

Detect video changes using hierarchical attention and object priors
Combine instance masking and diffusion models for accurate detection
Improve multi-class change maps with semantic classification and SSIM
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines instance masking with diffusion models
Uses hierarchical cross attention for context
Refines results via SSIM for human perception
🔎 Similar Papers
No similar papers found.
A
Andrew Kiruluta
School of Information, University of California, Berkeley
E
Eric Lundy
School of Information, University of California, Berkeley
A
Andreas Lemos
School of Information, University of California, Berkeley