CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling

📅 2024-06-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current data-driven carbon flux modeling is hindered by the absence of standardized, multimodal, machine learning–ready benchmark datasets. To address this, we introduce the first global multimodal dataset specifically designed for carbon flux estimation, integrating in-situ carbon flux measurements from 385 eddy-covariance flux towers with high-resolution meteorological variables and Sentinel/Landsat satellite imagery—rigorously aligned in space and time and normalized for ML readiness. We propose a Transformer-based multimodal fusion architecture that enables end-to-end joint modeling of meteorological and remote sensing features—a novel approach not previously demonstrated for this task. Experiments show our model reduces mean absolute error (MAE) by 12.3% over state-of-the-art methods and significantly improves cross-regional prediction robustness. This work establishes a foundational benchmark dataset and provides both critical infrastructure and a methodological paradigm for reproducible, comparable deep learning research on carbon flux estimation.

Technology Category

Application Category

📝 Abstract
Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO$_2$ emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote comparisons between models. To address this gap, we present CarbonSense, the first machine learning-ready dataset for DDCFM. CarbonSense integrates measured carbon fluxes, meteorological predictors, and satellite imagery from 385 locations across the globe, offering comprehensive coverage and facilitating robust model training. Additionally, we provide a baseline model using a current state-of-the-art DDCFM approach and a novel transformer based model. Our experiments illustrate the potential gains that multimodal deep learning techniques can bring to this domain. By providing these resources, we aim to lower the barrier to entry for other deep learning researchers to develop new models and drive new advances in carbon flux modelling.
Problem

Research questions and friction points this paper is trying to address.

Lack of standardized dataset for carbon flux modeling comparisons
Need for integrating diverse data sources for robust model training
Challenges in applying multimodal deep learning to carbon flux prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

First machine learning-ready dataset for DDCFM
Integrates carbon fluxes, meteorological data, satellite imagery
Baseline model with transformer-based approach
🔎 Similar Papers
No similar papers found.
M
Matthew Fortier
Mila Quebec AI Institute & Polytechnique Montréal
M
Mats L. Richter
ServiceNow
O
O. Sonnentag
Université de Montréal
Chris Pal
Chris Pal
Professor, Polytechnique Montréal & Mila, ServiceNow Research, Canada CIFAR AI Chair
Deep LearningComputer Vision & Pattern RecognitionNatural Language ProcessingData MiningAI