GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing entity-level carbon emission forecasting lacks a unified, open, and multitask benchmark, with data highly fragmented in accessibility, granularity, and evaluation protocols. This work proposes GHGbench, the first multitask greenhouse gas emission prediction benchmark spanning both corporate and building levels. It integrates heterogeneous multisource data, establishes standardized data splits, and defines core tasks including cross-regional transfer and temporal forecasting. The study systematically evaluates diverse baselines—gradient boosting trees, MLPs, FT-Transformers, tabular foundation models, and multimodal remote sensing embeddings—and reveals that building-level emission prediction is substantially more challenging than at the corporate level. Performance gaps under distribution shift far exceed those attributable to model choice, while multimodal remote sensing embeddings notably mitigate representation generalization failure. Furthermore, tabular foundation models outperform tuned tree-based methods across multiple cities in building-level tasks.

📝 Abstract

Open datasets and benchmarks for entity-level carbon-emission prediction remain fragmented across access, scale, granularity, and evaluation. We introduce GHGbench, an open dataset and benchmark for company- and building-level greenhouse-gas prediction. The company track contains 32,000+ company-year records from 12,000+ firms with Scope 1+2 and Scope 3 disclosures and financial/sectoral signals; the building track harmonises 491,591 building-year records from 13 open sources into a single schema across 26 metropolitan areas (10 U.S., 15 Australian, 1 Singaporean), with climate covariates and multimodal remote-sensing embeddings. GHGbench defines canonical splits with in-distribution and cross-region/city transfer as primary tasks and temporal hold-out plus short-horizon forecasting as supplementary appendix evidence; headline baselines span gradient-boosted trees, a tabular foundation model, MLP, FT-Transformer, and multimodal fusion, with an LLM panel as auxiliary, all evaluated under multi-seed paired-bootstrap tests. Three benchmark-level findings emerge: (i) building emissions are structurally harder than company emissions; (ii) the in-distribution to out-of-distribution gap dwarfs any within-model gap across both the company track and the building track, and a tabular foundation model is, to our knowledge, the first baseline to open a paired-bootstrap-significant gap over tuned trees on a multi-city building-emissions task; (iii) multimodal remote-sensing embeddings help precisely where tabular generalisation breaks. GHGbench also exposes catastrophic city transfer and the sector-factor lookup ceiling as systematic failure modes. Code and reconstruction recipes are available at GHGbench.

Problem

Research questions and friction points this paper is trying to address.

carbon emission prediction

benchmark

multi-entity

multi-task

greenhouse gas

Innovation

Methods, ideas, or system contributions that make the work stand out.

carbon emission prediction

tabular foundation model

multimodal remote sensing