Problem
Research questions and friction points this paper is trying to address.
Handles multi-modal earth observation imagery fusion
Learns single representation from mixed-resolution input bands
Demonstrates interpretability and downstream task applicability
Innovation
Methods, ideas, or system contributions that make the work stand out.
Fuses multi-modal imagery via attention mechanism
Uses pyramidal vision transformer architecture
Self-supervised training with SwAV algorithm