🤖 AI Summary
This study addresses the challenge of quantitatively assessing the representativeness of autonomous driving test scenario sets with respect to the target operational design domain (TOD), where the true TOD distribution is unknown and must be inferred from limited samples. We propose an uncertainty-aware interval estimation framework that jointly handles dual uncertainties—arising from both prior knowledge scarcity and data sparsity—via imprecise Bayesian modeling. The framework integrates multidimensional feature encoding (e.g., weather, road type, time-of-day) with statistical distribution comparison techniques to enable simultaneous local and global representativeness quantification. Experimental results demonstrate that our method yields rigorous confidence-guaranteed interval estimates of representativeness, offering interpretable and robust evaluation metrics for safety validation under complex operational design domains (ODDs).
📝 Abstract
Assuring the trustworthiness and safety of AI systems, e.g., autonomous vehicles (AV), depends critically on the data-related safety properties, e.g., representativeness, completeness, etc., of the datasets used for their training and testing. Among these properties, this paper focuses on representativeness-the extent to which the scenario-based data used for training and testing, reflect the operational conditions that the system is designed to operate safely in, i.e., Operational Design Domain (ODD) or expected to encounter, i.e., Target Operational Domain (TOD). We propose a probabilistic method that quantifies representativeness by comparing the statistical distribution of features encoded by the scenario suites with the corresponding distribution of features representing the TOD, acknowledging that the true TOD distribution is unknown, as it can only be inferred from limited data. We apply an imprecise Bayesian method to handle limited data and uncertain priors. The imprecise Bayesian formulation produces interval-valued, uncertainty-aware estimates of representativeness, rather than a single value. We present a numerical example comparing the distributions of the scenario suite and the inferred TOD across operational categories-weather, road type, time of day, etc., under dependencies and prior uncertainty. We estimate representativeness locally (between categories) and globally as an interval.