🤖 AI Summary
This work addresses the “perception–physics paradox” in vision foundation models, wherein high perceptual fidelity coexists with flawed physical reasoning due to reliance on visual correlations rather than underlying physical laws. To bridge this gap, the study introduces “scientific alignment” as an implicit objective for representation learning, formalized through structural isomorphism to establish hierarchical necessary conditions and a testable evaluation framework. The authors further release TC-Bench, a benchmark dataset for tropical cyclones, to systematically probe models’ physical plausibility and causal interpretability. Experiments reveal a significant performance degradation of existing models under intense storm conditions, demonstrating that scaling alone fails to achieve scientific alignment and underscoring their dependence on visual shortcuts rather than genuine physical reasoning.
📝 Abstract
While Vision Foundation Models (VFMs) excel at predictive tasks on satellite imagery, their performance can arise from visual correlations rather than underlying structural invariants, making even perception-based out-of-distribution accuracy a poor proxy for scientific utility. As a result, models may look correct without reasoning correctly, a discrepancy we term the Perception-Physics Paradox. To address this gap, we introduce scientific alignment as an implicit objective for representation learning in scientific domains. We study a principled, testable aspect of scientific alignment through structural isomorphism, which requires latent representations to uniquely identify physical systems up to a linear reparameterization. This perspective induces a hierarchy of necessary conditions and yields a systematic probing protocol for physical and causal interpretability. To operationalize this framework, we release TC-Bench, a global, reproducible benchmark dataset with an automated construction pipeline for tropical cyclone research, and show that current VFMs rely on visual shortcuts that collapse in intense regimes, indicating that scientific alignment does not arise as a natural byproduct of scaling alone.