🤖 AI Summary
State-of-the-art multimodal remote sensing models achieve strong performance but suffer from poor interpretability, reflecting an inherent trade-off between architectural complexity and transparency.
Method: We propose a multitask learning framework grounded in the principle of “modality-as-auxiliary-task”: certain modalities (e.g., SAR, hyperspectral) are excluded from model inputs and instead treated as auxiliary prediction targets; a shared encoder is jointly optimized for both the primary task (e.g., land-cover classification or semantic segmentation) and multiple modality reconstruction tasks.
Contribution: Without requiring additional annotations or data acquisition, this approach enables intrinsic interpretability—auxiliary task reconstruction errors and their spatial distributions, coupled with sensitivity analysis, reveal the model’s decision rationale in a quantifiable and spatially localized manner. Evaluated on three benchmark remote sensing datasets, our method matches or surpasses multimodal baselines in primary task accuracy while delivering principled, interpretable insights into model behavior.
📝 Abstract
Remote sensing provides satellite data in diverse types and formats. The usage of multimodal learning networks exploits this diversity to improve model performance, except that the complexity of such networks comes at the expense of their interpretability. In this study, we explore how modalities can be leveraged through multitask learning to intrinsically explain model behavior. In particular, instead of additional inputs, we use certain modalities as additional targets to be predicted along with the main task. The success of this approach relies on the rich information content of satellite data, which remains as input modalities. We show how this modeling context provides numerous benefits: (1) in case of data scarcity, the additional modalities do not need to be collected for model inference at deployment, (2) the model performance remains comparable to the multimodal baseline performance, and in some cases achieves better scores, (3) prediction errors in the main task can be explained via the model behavior in the auxiliary task(s). We demonstrate the efficiency of our approach on three datasets, including segmentation, classification, and regression tasks. Code available at git.opendfki.de/hiba.najjar/mtl_explainability/.