From Pixels to Patches: Pooling Strategies for Earth Embeddings

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical challenge of effectively aggregating pixel-level geospatial embeddings to produce patch representations that enhance class discriminability and cross-regional generalization. We systematically evaluate 13 pooling strategies—11 parameter-free and 2 parameterized—on a newly constructed EuroSAT-Embed benchmark. Our analysis reveals, for the first time, that mean pooling suffers significant performance degradation under spatial shifts, and we propose Generalized Mean (GeM) pooling as a plug-and-play alternative. Furthermore, we find that statistical pooling—concatenating min, max, mean, and standard deviation—achieves optimal performance with high-dimensional embeddings. Compared to mean pooling, advanced strategies reduce the geographic generalization gap by up to 40% and improve accuracy by as much as 5% under spatial partitioning, underscoring the pivotal role of distributional statistics in embedding aggregation.

Technology Category

Application Category

📝 Abstract
As geospatial foundation models shift from patch-level to pixel-level embeddings, practitioners must aggregate thousands of pixel vectors into patch representations that preserve class-discriminative signal while matching downstream label resolution. The default choice, mean pooling, discards within-patch variability and can drop accuracy by more than 10% under spatial shift. To evaluate this effect, we introduce EuroSAT-Embed: 81,000 embedding GeoTIFFs derived from three foundation models: AlphaEarth, OlmoEarth, and Tessera. We benchmark 11 training-free and 2 parametric pooling methods under both random and geographically disjoint test splits. Our results show that richer pooling schemes reduce the geographic generalization gap by up to 40% relative to mean pooling and increases accuracy by up to 5% on spatial splits. We recommend Generalized Mean Pooling (GeM) as a drop-in replacement for mean pooling: it improves accuracy without increasing embedding dimensionality. For maximum accuracy, Stats pooling (concatenation of min/max/mean/std pooling) performs best at 4x the embedding size. We further find that pooling effectiveness varies across embedding sources and that higher-dimensional embeddings benefit most from distributional statistics.
Problem

Research questions and friction points this paper is trying to address.

pixel-level embeddings
patch representation
pooling strategies
geographic generalization
spatial shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

pooling strategies
geospatial embeddings
geographic generalization
Generalized Mean Pooling (GeM)
Stats pooling
🔎 Similar Papers
No similar papers found.