🤖 AI Summary
Current autonomous driving systems (ADS) lack standardized testing evaluation criteria; most existing methods suffer from low validity and poor reproducibility due to neglect of ADS decision-making mechanisms. This paper identifies, for the first time, that test validity fundamentally depends on the rationality and determinism inherent in ADS design. Accordingly, we propose a test evaluation framework grounded in interpretable and reproducible ADS behaviors. Methodologically, the framework integrates control-policy analysis with quantitative assessment of scenario validity, and is empirically validated across eight open-source ADS implementations. Results demonstrate that mainstream ADS exhibit widespread deficiencies in rationality and determinism, undermining the ability of conventional testing to guarantee critical safety properties. Our framework effectively delineates the applicability boundaries and limitations of existing testing methods, thereby providing both theoretical foundations and practical guidance for establishing trustworthy ADS evaluation systems.
📝 Abstract
Despite extensive research, the testing of autonomous driving systems (ADS) landscape remains fragmented, and there is currently no basis for an informed technical assessment of the importance and contribution of the current state of the art. This paper attempts to address this problem by exploring two complementary aspects.
First, it proposes a framework for comparing existing test methods in terms of their intrinsic effectiveness and validity. It shows that many methods do not meet both of these requirements. Either because they are based on criteria that do not allow for rapid, inexpensive, and comprehensive detection of failures, or because the degree of validity of the properties tested cannot be accurately estimated. In particular, it is shown that most critical test methods do not take into account the nominal operational capabilities of autopilots and generate scenarios that are impossible for the tested vehicles to handle, resulting in unjustified rejections.
Secondly, the paper shows that test effectiveness and validity are highly dependent on how autopilots are designed: how they choose between different control policies to perform maneuvers, as well as on the reproducibility of the results. In fact, most test methods take for granted two principles underlying traditional methods, but do not generally apply to ADS. We maintain that the absence of rationality and determinacy significantly impairs the effectiveness and validity of test methods, and provide test results on eight open autopilots, in which most do not satisfy these properties, thereby illustrating this fact.
We conclude that under the current state of the art, it is impossible to obtain strong enough guarantees for essential autopilot properties and recommend that autopilots be developed with a view to both rationality and determinacy.