Learning What's Real: Disentangling Signal and Measurement Artifacts in Multi-Sensor Data, with Applications to Astrophysics

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge that physical signals in multi-sensor observations are often entangled with instrument-specific artifacts, impeding accurate information extraction and cross-device data fusion. To resolve this, the authors propose a self-supervised pretraining paradigm that leverages overlapping observations to construct training pairs and employs a dual-encoder architecture. By treating sensor effects as a form of data augmentation and introducing a counterfactual generation objective, the method disentangles instrument-invariant physical signals from sensor-induced artifacts. Evaluated on DESI Legacy and HSC galaxy images, the approach achieves high-quality counterfactual view synthesis, debiased estimation of physical parameters, and effective cross-instrument similarity retrieval, substantially enhancing the interpretability and consistency of multi-source astronomical data.

Technology Category

Application Category

📝 Abstract

Data collected from the physical world is always a combination of multiple sources: an underlying signal from the physical process of interest and a signal from measurement-dependent artifacts from the sensor or instrument. This secondary signal acts as a confounding factor, limiting our ability to extract information about the physics underlying the phenomena we observe. Furthermore, it complicates the combination of observations in heterogeneous or multi-instrument settings. We propose a deep learning framework that leverages overlapping observations, a dual-encoder architecture, and a counterfactual generation objective to disentangle these factors of variation. The resulting representations explicitly separate intrinsic signals from sensor-specific distortions and noise, and can be used for counterfactual view generation, parameter inference unconfounded by measurement distortions, and instrument-independent similarity search. We demonstrate the effectiveness of our approach on astrophysical galaxy images from the DESI Legacy Imaging Survey (Legacy) and the Hyper Suprime-Cam (HSC) Survey as a representative multi-instrument setting. This framework provides a general recipe for scientific and multi-modal self-supervised pretraining: construct training pairs from overlapping observations of the same physical system, treat sensor- or modality-specific effects as augmentations, and learn invariant representations through counterfactual generation.

Problem

Research questions and friction points this paper is trying to address.

multi-sensor data

measurement artifacts

signal disentanglement

confounding factors

instrument heterogeneity

Innovation

Methods, ideas, or system contributions that make the work stand out.

disentanglement

counterfactual generation

multi-sensor learning