Better, Not Just More: Data-centric machine learning for Earth observation

📅 2023-12-08
🏛️ IEEE Geoscience and Remote Sensing Magazine
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Machine learning research in Earth observation has long been hindered by benchmark datasets that are disconnected from real-world applications and exhibit saturated performance, resulting in poor model generalizability and deployment reliability. Method: This paper proposes a data-centric paradigm tailored to geospatial data, systematically formalizing its methodological framework and clarifying its complementary relationship with the prevailing model-centric paradigm. Emphasizing data quality, representativeness, and closed-loop iteration, it integrates key techniques—including data cleaning, annotation optimization, synthetic augmentation, distribution alignment, active learning, and feedback-driven refinement—into an end-to-end pipeline spanning problem formulation, data optimization, modeling, deployment, and feedback integration. Contribution/Results: Experiments demonstrate that this paradigm significantly improves model accuracy and robustness on unseen scenarios, breaks through benchmark performance ceilings, and substantially enhances practical applicability in operational Earth observation systems.
📝 Abstract
Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that a shift from a model-centric view to a complementary data-centric perspective is necessary for further improvements in accuracy, generalization ability, and real impact on enduser applications. Furthermore, considering the entire machine learning cycle — from problem definition to model deployment with feedback — is crucial for enhancing machine learning models that can be reliable in unforeseen situations. This work presents a definition as well as a precise categorization and overview of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.
Problem

Research questions and friction points this paper is trying to address.

Shift from model-centric to data-centric machine learning for geospatial data.
Enhance accuracy and generalization in real-world Earth observation applications.
Integrate data-centric approaches across the entire machine learning cycle.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shift from model-centric to data-centric learning
Automated data-centric approaches for geospatial data
Integration of entire machine learning cycle
🔎 Similar Papers
No similar papers found.
R
R. Roscher
Data Science for Crop Systems Group, Forschungszentrum Jülich GmbH, Wilhelm-Johnen-Straße, Jülich, 52428, Germany; Remote Sensing Group, University of Bonn, Niebuhrstr. 1a, Bonn, 53113, Germany
M
M. Rußwurm
Laboratory of Geo-information Science and Remote Sensing, Wageningen University, Droevendaalsesteeg 3, Wageningen, 6708 PB, Gelderland, Netherlands
C
Caroline Gevaert
Department of Earth Observation Science, Faculty ITC, University of Twente, Drienerlolaan 5, Enschede, 7522 NB, Overijssel, the Netherlands
M
Michael Kampffmeyer
Department of Physics and Technology, UiT The Arctic University of Norway, Klokkargårdsbakken 35, Tromsø, 9019, Norway
J
J. A. D. Santos
Department of Computer Science, University of Sheffield, 211 Portobello, Sheffield City Centre, Sheffield, S1 4DP, United Kingdom
Maria Vakalopoulou
Maria Vakalopoulou
Assistant Professor at CentraleSupélec
Medical ImagingRemote SensingComputer VisionMachine LearningArtificial Intelligence
Ronny Hänsch
Ronny Hänsch
postdoctoral research fellow, German Aerospace Center, Oberpfaffenhofen
Maschine LearningRemote SensingComputer VisionImage Processing3D-Reconstruction
Stine Hansen
Stine Hansen
UiT The Arctic University of Norway
Machine LearningDeep LearningMedical Image AnalysisComputer Vision
Keiller Nogueira
Keiller Nogueira
University of Liverpool
Machine/Deep LearningImage ProcessingPattern Recognition
J
Jonathan Prexl
Department of Aerospace Engineering, University of the Bundeswehr Munich, Werner-Heisenberg-Weg 39, Neubiberg, 85579, Bavaria, Germany
D
D. Tuia
Environmental Computational Science and Earth Observation Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL), Ronquos 86, Sion, 1951, Switzerland