DiffMOD: Progressive Diffusion Point Denoising for Moving Object Detection in Remote Sensing

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Low-resolution imagery, extremely small targets, and severe noise in remote sensing data severely degrade both detection accuracy and temporal consistency for moving object detection (MOD). To address the limitation of existing probability density estimation–based methods in modeling high-order spatiotemporal dependencies, this paper proposes the first point-cloud–based progressive diffusion denoising framework tailored for MOD: it formulates detection as an iterative recovery of motion target centers from sparse noisy points; introduces a spatial relational aggregation attention mechanism and an implicit memory–driven temporal propagation module to enable dynamic cross-frame feature fusion; and incorporates a progressive MinK optimal transport matching strategy alongside a cluster-missing–robust loss to enhance matching reliability. Evaluated on the RsData benchmark, our method achieves significant improvements in small-object recall and inter-frame consistency, attaining state-of-the-art accuracy and robustness.

Technology Category

Application Category

📝 Abstract

Moving object detection (MOD) in remote sensing is significantly challenged by low resolution, extremely small object sizes, and complex noise interference. Current deep learning-based MOD methods rely on probability density estimation, which restricts flexible information interaction between objects and across temporal frames. To flexibly capture high-order inter-object and temporal relationships, we propose a point-based MOD in remote sensing. Inspired by diffusion models, the network optimization is formulated as a progressive denoising process that iteratively recovers moving object centers from sparse noisy points. Specifically, we sample scattered features from the backbone outputs as atomic units for subsequent processing, while global feature embeddings are aggregated to compensate for the limited coverage of sparse point features. By modeling spatial relative positions and semantic affinities, Spatial Relation Aggregation Attention is designed to enable high-order interactions among point-level features for enhanced object representation. To enhance temporal consistency, the Temporal Propagation and Global Fusion module is designed, which leverages an implicit memory reasoning mechanism for robust cross-frame feature integration. To align with the progressive denoising process, we propose a progressive MinK optimal transport assignment strategy that establishes specialized learning objectives at each denoising level. Additionally, we introduce a missing loss function to counteract the clustering tendency of denoised points around salient objects. Experiments on the RsData remote sensing MOD dataset show that our MOD method based on scattered point denoising can more effectively explore potential relationships between sparse moving objects and improve the detection capability and temporal consistency.

Problem

Research questions and friction points this paper is trying to address.

Detect moving objects in low-resolution remote sensing images

Improve inter-object and temporal feature interaction flexibility

Reduce noise interference for small moving object detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive diffusion point denoising for MOD

Spatial Relation Aggregation Attention mechanism

Temporal Propagation and Global Fusion module

🔎 Similar Papers

Hierarchical Attention Diffusion Networks with Object Priors for Video Change Detection