SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autonomous driving systems exhibit insufficient robustness against rare semantic anomalies (out-of-distribution scenarios), and existing zero-shot vision-language model (VLM)-based prompting methods suffer from unstable performance and reliance on proprietary large models, hindering practical deployment. This paper proposes SAVANT: a structured framework that decomposes VLM reasoning via hierarchical visual-semantic analysis—categorizing scenes into four levels (street, infrastructure, dynamic objects, and environment)—and a two-stage detection pipeline. SAVANT integrates lightweight open-source VLMs, structured scene description extraction, and multimodal evaluation to enable efficient, low-cost local deployment. Evaluated on real-world driving data, SAVANT achieves 90.8% recall and 93.8% precision—substantially outperforming baseline methods—and automatically annotates over 9,640 anomalous images, effectively mitigating the scarcity of labeled anomaly data.

Technology Category

Application Category

📝 Abstract
Autonomous driving systems remain critically vulnerable to the long-tail of rare, out-of-distribution scenarios with semantic anomalies. While Vision Language Models (VLMs) offer promising reasoning capabilities, naive prompting approaches yield unreliable performance and depend on expensive proprietary models, limiting practical deployment. We introduce SAVANT (Semantic Analysis with Vision-Augmented Anomaly deTection), a structured reasoning framework that achieves high accuracy and recall in detecting anomalous driving scenarios from input images through layered scene analysis and a two-phase pipeline: structured scene description extraction followed by multi-modal evaluation. Our approach transforms VLM reasoning from ad-hoc prompting to systematic analysis across four semantic layers: Street, Infrastructure, Movable Objects, and Environment. SAVANT achieves 89.6% recall and 88.0% accuracy on real-world driving scenarios, significantly outperforming unstructured baselines. More importantly, we demonstrate that our structured framework enables a fine-tuned 7B parameter open-source model (Qwen2.5VL) to achieve 90.8% recall and 93.8% accuracy - surpassing all models evaluated while enabling local deployment at near-zero cost. By automatically labeling over 9,640 real-world images with high accuracy, SAVANT addresses the critical data scarcity problem in anomaly detection and provides a practical path toward reliable, accessible semantic monitoring for autonomous systems.
Problem

Research questions and friction points this paper is trying to address.

Detecting rare semantic anomalies in autonomous driving scenarios
Overcoming unreliable VLM prompting for practical anomaly detection
Addressing data scarcity in autonomous system anomaly detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured reasoning framework for anomaly detection
Layered scene analysis across four semantic categories
Open-source model achieving high accuracy and recall
🔎 Similar Papers
No similar papers found.
R
Roberto Brusnicki
Professorship of Autonomous Vehicle Systems, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching, Germany; Munich Institute of Robotics and Machine Intelligence (MIRMI)
D
David Pop
Professorship of Autonomous Vehicle Systems, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching, Germany; Munich Institute of Robotics and Machine Intelligence (MIRMI)
Y
Yuan Gao
Professorship of Autonomous Vehicle Systems, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching, Germany; Munich Institute of Robotics and Machine Intelligence (MIRMI)
Mattia Piccinini
Mattia Piccinini
TUM Global Post-doc Researcher, Technical University of Munich
Autonomous VehiclesArtificial IntelligenceRoboticsTrajectory PlanningMotion Control
Johannes Betz
Johannes Betz
Professor, Autonomous Vehicle Systems, Technical University of Munich (TUM)
Autonomous SystemsMotion PlaningControlRobots