Atomizer: Generalizing to new modalities by breaking satellite images down to a set of scalars

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The heterogeneity of remote sensing satellite image modalities limits the generalization capability of existing models, which rely on fixed input formats and modality-specific encoders—necessitating full retraining for newly introduced sensors. Method: We propose an atomic scalar representation framework that decomposes images into a set of scalars encoding spatiotemporal-spectral metadata, enabling unified processing of arbitrary sensor configurations via a single encoder. Key innovations include structured tokenization (combining Fourier features with non-uniform radial basis functions), context-enhanced scalar embedding, and cross-attention mapping into a shared latent space. Contribution/Results: The framework achieves zero-shot cross-modal generalization without interpolation, resampling, or fine-tuning. In modality-exclusive evaluation benchmarks, it significantly outperforms state-of-the-art methods, demonstrates robustness to resolution and spatial-scale variations, and substantially improves overall generalization performance.

Technology Category

Application Category

📝 Abstract
The growing number of Earth observation satellites has led to increasingly diverse remote sensing data, with varying spatial, spectral, and temporal configurations. Most existing models rely on fixed input formats and modality-specific encoders, which require retraining when new configurations are introduced, limiting their ability to generalize across modalities. We introduce Atomizer, a flexible architecture that represents remote sensing images as sets of scalars, each corresponding to a spectral band value of a pixel. Each scalar is enriched with contextual metadata (acquisition time, spatial resolution, wavelength, and bandwidth), producing an atomic representation that allows a single encoder to process arbitrary modalities without interpolation or resampling. Atomizer uses structured tokenization with Fourier features and non-uniform radial basis functions to encode content and context, and maps tokens into a latent space via cross-attention. Under modality-disjoint evaluations, Atomizer outperforms standard models and demonstrates robust performance across varying resolutions and spatial sizes.
Problem

Research questions and friction points this paper is trying to address.

Generalizing models for diverse remote sensing data modalities
Overcoming fixed input formats and modality-specific encoders
Enabling single encoder processing for arbitrary modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Represents images as scalar sets with metadata
Uses Fourier features for structured tokenization
Maps tokens via cross-attention for generalization
🔎 Similar Papers
No similar papers found.