🤖 AI Summary
The heterogeneity of remote sensing satellite image modalities limits the generalization capability of existing models, which rely on fixed input formats and modality-specific encoders—necessitating full retraining for newly introduced sensors.
Method: We propose an atomic scalar representation framework that decomposes images into a set of scalars encoding spatiotemporal-spectral metadata, enabling unified processing of arbitrary sensor configurations via a single encoder. Key innovations include structured tokenization (combining Fourier features with non-uniform radial basis functions), context-enhanced scalar embedding, and cross-attention mapping into a shared latent space.
Contribution/Results: The framework achieves zero-shot cross-modal generalization without interpolation, resampling, or fine-tuning. In modality-exclusive evaluation benchmarks, it significantly outperforms state-of-the-art methods, demonstrates robustness to resolution and spatial-scale variations, and substantially improves overall generalization performance.
📝 Abstract
The growing number of Earth observation satellites has led to increasingly diverse remote sensing data, with varying spatial, spectral, and temporal configurations. Most existing models rely on fixed input formats and modality-specific encoders, which require retraining when new configurations are introduced, limiting their ability to generalize across modalities. We introduce Atomizer, a flexible architecture that represents remote sensing images as sets of scalars, each corresponding to a spectral band value of a pixel. Each scalar is enriched with contextual metadata (acquisition time, spatial resolution, wavelength, and bandwidth), producing an atomic representation that allows a single encoder to process arbitrary modalities without interpolation or resampling. Atomizer uses structured tokenization with Fourier features and non-uniform radial basis functions to encode content and context, and maps tokens into a latent space via cross-attention. Under modality-disjoint evaluations, Atomizer outperforms standard models and demonstrates robust performance across varying resolutions and spatial sizes.