🤖 AI Summary
Standard concept bottleneck models suffer from limited generalization in scientific settings with sparse supervision, primarily due to their neglect of domain-specific causal mechanisms and reliance on complete concept annotations. This work proposes a novel concept bottleneck model that integrates domain-defined causal structures by embedding physically interpretable intermediate concepts into the learning process, thereby enforcing causally consistent and explainable modeling. The approach accommodates multi-source, heterogeneous weak supervision and effectively identifies and suppresses spurious correlations. Evaluated on a remote sensing task for biomass density estimation, the model substantially outperforms multiple baselines, achieving lower prediction error and bias while producing scientifically meaningful, interpretable intermediate outputs that offer new insights for Earth observation analysis.
📝 Abstract
Concept Bottleneck Models (CBMs) improve the explainability of black-box Deep Learning (DL) by introducing intermediate semantic concepts. However, standard CBMs often overlook domain-specific relationships and causal mechanisms, and their dependence on complete concept labels limits applicability in scientific domains where supervision is sparse but processes are well defined. To address this, we propose the Process-Guided Concept Bottleneck Model (PG-CBM), an extension of CBMs which constrains learning to follow domain-defined causal mechanisms through biophysically meaningful intermediate concepts. Using above ground biomass density estimation from Earth Observation data as a case study, we show that PG-CBM reduces error and bias compared to multiple benchmarks, whilst leveraging multi-source heterogeneous training data and producing interpretable intermediate outputs. Beyond improved accuracy, PG-CBM enhances transparency, enables detection of spurious learning, and provides scientific insights, representing a step toward more trustworthy AI systems in scientific applications.