Exploration of VLMs for Driver Monitoring Systems Applications

📅 2025-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Driver Monitoring Systems (DMS) lack systematic investigation into Vision-Language Models (VLMs), particularly zero-shot applications. Method: This work pioneers the zero-shot adaptation of large multimodal models—including LLaVA and Qwen-VL—to DMS via driving-specific prompt engineering, eliminating the need for fine-tuning or task-specific architectural modifications. Contribution/Results: Evaluated on the Driver Monitoring Dataset, our approach achieves superior zero-shot performance over conventional supervised models in critical tasks such as fatigue and distraction detection. Notably, VLMs demonstrate exceptional semantic comprehension and cross-scenario generalization—capabilities inherently limited in traditional DMS pipelines. By circumventing reliance on large-scale annotated data and handcrafted feature engineering, this study establishes a new paradigm for lightweight, interpretable, and open-domain in-cabin perception. It bridges a key gap between foundation models and real-world automotive vision applications, offering a scalable framework for next-generation intelligent cockpit systems.

Technology Category

Application Category

📝 Abstract
In recent years, we have witnessed significant progress in emerging deep learning models, particularly Large Language Models (LLMs) and Vision-Language Models (VLMs). These models have demonstrated promising results, indicating a new era of Artificial Intelligence (AI) that surpasses previous methodologies. Their extensive knowledge and zero-shot capabilities suggest a paradigm shift in developing deep learning solutions, moving from data capturing and algorithm training to just writing appropriate prompts. While the application of these technologies has been explored across various industries, including automotive, there is a notable gap in the scientific literature regarding their use in Driver Monitoring Systems (DMS). This paper presents our initial approach to implementing VLMs in this domain, utilising the Driver Monitoring Dataset to evaluate their performance and discussing their advantages and challenges when implemented in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Explores VLMs for Driver Monitoring Systems applications.
Evaluates VLMs using Driver Monitoring Dataset performance.
Discusses VLM advantages and challenges in real-world scenarios.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes Vision-Language Models for driver monitoring
Evaluates VLMs using Driver Monitoring Dataset
Explores zero-shot capabilities in real-world DMS
🔎 Similar Papers
No similar papers found.
P
Paola Natalia Canas
Fundación Vicomtech, Basque Research and Technology Alliance, Spain; University of the Basque Country (UPV/EHU), Spain
Marcos Nieto
Marcos Nieto
Principal Researcher, Vicomtech
Computer visionDriver AssistanceBayesian inference
Oihana Otaegui
Oihana Otaegui
vicomtech
I
Igor Rodr'iguez
University of the Basque Country (UPV/EHU), Spain