🤖 AI Summary
This work addresses the geometric mismatch between user-defined regions and precomputed embedding grids in Earth observation, a challenge that renders conventional interpolation methods ineffective due to the highly non-convex nature of the embedding manifold. To overcome this, the authors propose the LEPA architecture, which introduces— for the first time—a geometry-equivariant conditional prediction network that treats geometric transformations as conditional inputs to directly predict aligned embeddings, thereby circumventing unreliable interpolation in non-convex spaces. The method requires no re-encoding and leverages geometric data augmentation alongside pretrained Earth observation foundation models (e.g., Prithvi-EO-2.0) to model the embedding space. Evaluated on HLS and ImageNet-1k, LEPA dramatically improves geometric alignment accuracy, boosting mean reciprocal rank (MRR) from below 0.2 to over 0.8.
📝 Abstract
Geospatial foundation models provide precomputed embeddings that serve as compact feature vectors for large-scale satellite remote sensing data. While these embeddings can reduce data-transfer bottlenecks and computational costs, Earth observation (EO) applications can still face geometric mismatches between user-defined areas of interest and the fixed precomputed embedding grid. Standard latent-space interpolation is unreliable in this setting because the embedding manifold is highly non-convex, yielding representations that do not correspond to realistic inputs. We verify this using Prithvi-EO-2.0 to understand the shortcomings of interpolation applied to patch embeddings. As a substitute, we propose a Learned Equivariance-Predicting Architecture (LEPA). Instead of averaging vectors, LEPA conditions a predictor on geometric augmentations to directly predict the transformed embedding. We evaluate LEPA on NASA/USGS Harmonized Landsat-Sentinel (HLS) imagery and ImageNet-1k. Experiments show that standard interpolation achieves a mean reciprocal rank (MRR) below 0.2, whereas LEPA increases MRR to over 0.8, enabling accurate geometric adjustment without re-encoding.