🤖 AI Summary
In multi-sensor IoT systems, heterogeneous sensor spatial layouts cause environmental representation distortion. Method: This paper proposes a location-aware self-supervised representation learning framework. Its core innovation is the first introduction of the “signal–location duality” design principle, integrating information-theoretic principles and occlusion-invariance theory to jointly perform geometric layout encoding, multi-view multimodal contrastive learning, and theory-driven representation disentanglement—enabling universal spatial representation without reliance on fixed deployment topologies. Contribution/Results: Evaluated on three real-world applications—vehicle monitoring, human activity recognition, and earthquake localization—the method significantly improves cross-modal, cross-deployment, and cross-scale generalization and robustness. It overcomes the key limitation of conventional IoT pretraining approaches, which neglect spatial geometric structure.
📝 Abstract
This work develops the underpinnings of self-supervised placement-aware representation learning given spatially-distributed (multi-view and multimodal) sensor observations, motivated by the need to represent external environmental state in multi-sensor IoT systems in a manner that correctly distills spatial phenomena from the distributed multi-vantage observations. The objective of sensing in IoT systems is, in general, to collectively represent an externally observed environment given multiple vantage points from which sensory observations occur. Pretraining of models that help interpret sensor data must therefore encode the relation between signals observed by sensors and the observers' vantage points in order to attain a representation that encodes the observed spatial phenomena in a manner informed by the specific placement of the measuring instruments, while allowing arbitrary placement. The work significantly advances self-supervised model pretraining from IoT signals beyond current solutions that often overlook the distinctive spatial nature of IoT data. Our framework explicitly learns the dependencies between measurements and geometric observer layouts and structural characteristics, guided by a core design principle: the duality between signals and observer positions. We further provide theoretical analyses from the perspectives of information theory and occlusion-invariant representation learning to offer insight into the rationale behind our design. Experiments on three real-world datasets--covering vehicle monitoring, human activity recognition, and earthquake localization--demonstrate the superior generalizability and robustness of our method across diverse modalities, sensor placements, application-level inference tasks, and spatial scales.