🤖 AI Summary
To address the “representation trilemma”—the simultaneous difficulty in achieving generalizability, interpretability, and computational efficiency—in complex-valued SAR image recognition under data scarcity and domain shift, this paper proposes a knowledge-driven lightweight neural network framework. Methodologically, we design a “compress–aggregate–compress” architecture that integrates electromagnetic scattering priors via a dictionary processor, couples a compact unfolding network with a hybrid ViT/CNN backbone, and employs a self-distillation classification head—enabling physics-guided sparse feature disentanglement and semantic compression. Evaluated on five SAR benchmarks, our model achieves state-of-the-art performance with only 0.7M–0.95M parameters. It demonstrates superior generalization in few-shot and out-of-distribution settings, offers interpretable physical reasoning, and maintains feasibility for edge deployment.
📝 Abstract
Deep learning models for complex-valued Synthetic Aperture Radar (CV-SAR) image recognition are fundamentally constrained by a representation trilemma under data-limited and domain-shift scenarios: the concurrent, yet conflicting, optimization of generalization, interpretability, and efficiency. Our work is motivated by the premise that the rich electromagnetic scattering features inherent in CV-SAR data hold the key to resolving this trilemma, yet they are insufficiently harnessed by conventional data-driven models. To this end, we introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture. The first stage performs a physics-guided compression, wherein a novel dictionary processor adaptively embeds physical priors, enabling a compact unfolding network to efficiently extract sparse, physically-grounded signatures. A subsequent aggregation module enriches these representations, followed by a final semantic compression stage that utilizes a compact classification head with self-distillation to learn maximally task-relevant and discriminative embeddings. We instantiate KINN in both CNN (0.7M) and Vision Transformer (0.95M) variants. Extensive evaluations on five SAR benchmarks confirm that KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios and tangible interpretability, thereby providing an effective solution to the representation trilemma and offering a new path for trustworthy AI in SAR image analysis.