From Pixels to Patches: Pooling Strategies for Earth Embeddings

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study addresses the critical challenge of effectively aggregating pixel-level geospatial embeddings to produce patch representations that enhance class discriminability and cross-regional generalization. We systematically evaluate 13 pooling strategies—11 parameter-free and 2 parameterized—on a newly constructed EuroSAT-Embed benchmark. Our analysis reveals, for the first time, that mean pooling suffers significant performance degradation under spatial shifts, and we propose Generalized Mean (GeM) pooling as a plug-and-play alternative. Furthermore, we find that statistical pooling—concatenating min, max, mean, and standard deviation—achieves optimal performance with high-dimensional embeddings. Compared to mean pooling, advanced strategies reduce the geographic generalization gap by up to 40% and improve accuracy by as much as 5% under spatial partitioning, underscoring the pivotal role of distributional statistics in embedding aggregation.

Technology Category

Application Category

📝 Abstract

As geospatial foundation models shift from patch-level to pixel-level embeddings, practitioners must aggregate thousands of pixel vectors into patch representations that preserve class-discriminative signal while matching downstream label resolution. The default choice, mean pooling, discards within-patch variability and can drop accuracy by more than 10% under spatial shift. To evaluate this effect, we introduce EuroSAT-Embed: 81,000 embedding GeoTIFFs derived from three foundation models: AlphaEarth, OlmoEarth, and Tessera. We benchmark 11 training-free and 2 parametric pooling methods under both random and geographically disjoint test splits. Our results show that richer pooling schemes reduce the geographic generalization gap by up to 40% relative to mean pooling and increases accuracy by up to 5% on spatial splits. We recommend Generalized Mean Pooling (GeM) as a drop-in replacement for mean pooling: it improves accuracy without increasing embedding dimensionality. For maximum accuracy, Stats pooling (concatenation of min/max/mean/std pooling) performs best at 4x the embedding size. We further find that pooling effectiveness varies across embedding sources and that higher-dimensional embeddings benefit most from distributional statistics.

Problem

Research questions and friction points this paper is trying to address.

pixel-level embeddings

patch representation

pooling strategies

geographic generalization

spatial shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

pooling strategies

geospatial embeddings

geographic generalization