🤖 AI Summary
Current autonomous driving scenario coverage evaluation relies either on labor-intensive manual annotation or computationally expensive large vision-language models (LVLMs), hindering scalable deployment. This paper proposes SCOUT, a lightweight framework that enables efficient coverage assessment for the first time using precomputed sensor-derived implicit representations. SCOUT employs knowledge distillation to transfer semantic labels—originally generated by LVLMs—into a compact surrogate network, which directly predicts coverage scores from perception features, thereby eliminating both manual annotation and real-time LVLM inference. The method integrates implicit representation extraction, an optimized neural architecture, and rigorous validation on large-scale real-world driving data. Experiments demonstrate that SCOUT maintains high accuracy while accelerating inference by two orders of magnitude and substantially reducing computational cost. As a result, SCOUT establishes a scalable, low-cost, automated paradigm for evaluating scenario coverage in large-scale autonomous driving systems.
📝 Abstract
Assessing scenario coverage is crucial for evaluating the robustness of autonomous agents, yet existing methods rely on expensive human annotations or computationally intensive Large Vision-Language Models (LVLMs). These approaches are impractical for large-scale deployment due to cost and efficiency constraints. To address these shortcomings, we propose SCOUT (Scenario Coverage Oversight and Understanding Tool), a lightweight surrogate model designed to predict scenario coverage labels directly from an agent's latent sensor representations. SCOUT is trained through a distillation process, learning to approximate LVLM-generated coverage labels while eliminating the need for continuous LVLM inference or human annotation. By leveraging precomputed perception features, SCOUT avoids redundant computations and enables fast, scalable scenario coverage estimation. We evaluate our method across a large dataset of real-life autonomous navigation scenarios, demonstrating that it maintains high accuracy while significantly reducing computational cost. Our results show that SCOUT provides an effective and practical alternative for large-scale coverage analysis. While its performance depends on the quality of LVLM-generated training labels, SCOUT represents a major step toward efficient scenario coverage oversight in autonomous systems.