🤖 AI Summary
This work addresses the challenge that physical signals in multi-sensor observations are often entangled with instrument-specific artifacts, impeding accurate information extraction and cross-device data fusion. To resolve this, the authors propose a self-supervised pretraining paradigm that leverages overlapping observations to construct training pairs and employs a dual-encoder architecture. By treating sensor effects as a form of data augmentation and introducing a counterfactual generation objective, the method disentangles instrument-invariant physical signals from sensor-induced artifacts. Evaluated on DESI Legacy and HSC galaxy images, the approach achieves high-quality counterfactual view synthesis, debiased estimation of physical parameters, and effective cross-instrument similarity retrieval, substantially enhancing the interpretability and consistency of multi-source astronomical data.
📝 Abstract
Data collected from the physical world is always a combination of multiple sources: an underlying signal from the physical process of interest and a signal from measurement-dependent artifacts from the sensor or instrument. This secondary signal acts as a confounding factor, limiting our ability to extract information about the physics underlying the phenomena we observe. Furthermore, it complicates the combination of observations in heterogeneous or multi-instrument settings. We propose a deep learning framework that leverages overlapping observations, a dual-encoder architecture, and a counterfactual generation objective to disentangle these factors of variation. The resulting representations explicitly separate intrinsic signals from sensor-specific distortions and noise, and can be used for counterfactual view generation, parameter inference unconfounded by measurement distortions, and instrument-independent similarity search. We demonstrate the effectiveness of our approach on astrophysical galaxy images from the DESI Legacy Imaging Survey (Legacy) and the Hyper Suprime-Cam (HSC) Survey as a representative multi-instrument setting. This framework provides a general recipe for scientific and multi-modal self-supervised pretraining: construct training pairs from overlapping observations of the same physical system, treat sensor- or modality-specific effects as augmentations, and learn invariant representations through counterfactual generation.