🤖 AI Summary
Addressing the challenges of large modality gaps and severe geometric misalignment in unsupervised cross-modal optical flow estimation, this paper proposes DCFlow. Methodologically, it introduces (1) a decoupled optimization strategy that separates modality translation from flow estimation, incorporating task-specific supervision; (2) a geometry-aware data synthesis pipeline coupled with an outlier-robust loss to enable reliable motion supervision without ground-truth flow labels; and (3) a cross-modal consistency constraint that jointly optimizes dual networks to enhance inter-modal geometric alignment. Evaluated on a newly constructed comprehensive cross-modal optical flow benchmark, DCFlow is compatible with various optical flow backbones and consistently outperforms existing unsupervised methods, achieving state-of-the-art performance.
📝 Abstract
This work presents DCFlow, a novel unsupervised cross-modal flow estimation framework that integrates a decoupled optimization strategy and a cross-modal consistency constraint. Unlike previous approaches that implicitly learn flow estimation solely from appearance similarity, we introduce a decoupled optimization strategy with task-specific supervision to address modality discrepancy and geometric misalignment distinctly. This is achieved by collaboratively training a modality transfer network and a flow estimation network. To enable reliable motion supervision without ground-truth flow, we propose a geometry-aware data synthesis pipeline combined with an outlier-robust loss. Additionally, we introduce a cross-modal consistency constraint to jointly optimize both networks, significantly improving flow prediction accuracy. For evaluation, we construct a comprehensive cross-modal flow benchmark by repurposing public datasets. Experimental results demonstrate that DCFlow can be integrated with various flow estimation networks and achieves state-of-the-art performance among unsupervised approaches.