🤖 AI Summary
In autonomous driving virtual testing, a “behavioral credibility gap” exists between synthetic and real-world images—current fidelity metrics assess only pixel-level or output-level similarity, neglecting whether the causal reasoning underlying system decisions remains consistent. Method: This paper proposes a novel fidelity evaluation paradigm grounded in the System Under Test’s (SUT) decision-making mechanism. We introduce the SUT-specific Decisive Feature Fidelity (DFF) metric—the first to elevate fidelity assessment from perceptual appearance to causal mechanism alignment. Our framework integrates explainable AI, counterfactual reasoning, feature attribution, and cross-domain mechanistic alignment to guide simulator calibration. Results: Evaluated on the KITTI/VirtualKITTI2 matched dataset, DFF uncovers decision biases invisible to conventional metrics; after calibration, DFF improves by 37.2%, while concurrently enhancing both input- and output-layer fidelity.
📝 Abstract
Virtual testing using synthetic data has become a cornerstone of autonomous vehicle (AV) safety assurance. Despite progress in improving visual realism through advanced simulators and generative AI, recent studies reveal that pixel-level fidelity alone does not ensure reliable transfer from simulation to the real world. What truly matters is whether the system-under-test (SUT) bases its decisions on the same causal evidence in both real and simulated environments - not just whether images "look real" to humans. This paper addresses the lack of such a behavior-grounded fidelity measure by introducing Decisive Feature Fidelity (DFF), a new SUT-specific metric that extends the existing fidelity spectrum to capture mechanism parity - the agreement in causal evidence underlying the SUT's decisions across domains. DFF leverages explainable-AI (XAI) methods to identify and compare the decisive features driving the SUT's outputs for matched real-synthetic pairs. We further propose practical estimators based on counterfactual explanations, along with a DFF-guided calibration scheme to enhance simulator fidelity. Experiments on 2126 matched KITTI-VirtualKITTI2 pairs demonstrate that DFF reveals discrepancies overlooked by conventional output-value fidelity. Furthermore, results show that DFF-guided calibration improves decisive-feature and input-level fidelity without sacrificing output value fidelity across diverse SUTs.