🤖 AI Summary
This work addresses the challenges of SAR-to-Optical image translation, which is prone to speckle noise and geometric distortions that often lead to semantic errors, texture blurring, and structural hallucinations. To overcome these issues, the authors propose OSCAR, a novel framework that achieves high-quality cross-modal translation through optical-aware semantic alignment, semantics-guided generative control, and uncertainty-aware optimization. Key innovations include an optical teacher–guided SAR encoder, a ControlNet-style semantic control module integrating class-aware textual prompts with hierarchical visual cues, and a heteroscedastic uncertainty–based dynamic reconstruction focusing mechanism. Extensive experiments demonstrate that OSCAR significantly outperforms existing methods in both perceptual quality and semantic consistency, effectively suppressing noise artifacts and structural distortions.
📝 Abstract
Synthetic Aperture Radar (SAR) provides robust all-weather imaging capabilities; however, translating SAR observations into photo-realistic optical images remains a fundamentally ill-posed problem. Current approaches are often hindered by the inherent speckle noise and geometric distortions of SAR data, which frequently result in semantic misinterpretation, ambiguous texture synthesis, and structural hallucinations. To address these limitations, a novel SAR-to-Optical (S2O) translation framework is proposed, integrating three core technical contributions: (i) Cross-Modal Semantic Alignment, which establishes an Optical-Aware SAR Encoder by distilling robust semantic priors from an Optical Teacher into a SAR Student (ii) Semantically-Grounded Generative Guidance, realized by a Semantically-Grounded ControlNet that integrates class-aware text prompts for global context with hierarchical visual prompts for local spatial guidance; and (iii) an Uncertainty-Aware Objective, which explicitly models aleatoric uncertainty to dynamically modulate the reconstruction focus, effectively mitigating artifacts caused by speckle-induced ambiguity. Extensive experiments demonstrate that the proposed method achieves superior perceptual quality and semantic consistency compared to state-of-the-art approaches.