From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning?

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This study addresses a critical gap in the evaluation of vision-language models (VLMs) for autonomous driving, which has predominantly focused on vehicle-centric perspectives while neglecting the perceptual and decision-making needs of cyclists. To this end, we introduce CyclingVQA, the first diagnostic visual question answering benchmark tailored to the cyclist’s viewpoint, systematically evaluating over 31 VLMs across key capabilities including spatial perception, temporal reasoning, traffic rule comprehension, and lane-based inference. By integrating spatially enhanced architectures with lane-mapping techniques and conducting thorough error analysis, our work reveals significant deficiencies in current VLMs’ ability to interpret cyclist-specific signals and reason about navigational lane relationships. Notably, some specialized driving models exhibit weaker generalization than general-purpose counterparts, offering clear guidance for future development of cyclist-aware assistance systems.

Technology Category

Application Category

📝 Abstract

Cyclists often encounter safety-critical situations in urban traffic, highlighting the need for assistive systems that support safe and informed decision-making. Recently, vision-language models (VLMs) have demonstrated strong performance on autonomous driving benchmarks, suggesting their potential for general traffic understanding and navigation-related reasoning. However, existing evaluations are predominantly vehicle-centric and fail to assess perception and reasoning from a cyclist-centric viewpoint. To address this gap, we introduce CyclingVQA, a diagnostic benchmark designed to probe perception, spatio-temporal understanding, and traffic-rule-to-lane reasoning from a cyclist's perspective. Evaluating 31+ recent VLMs spanning general-purpose, spatially enhanced, and autonomous-driving-specialized models, we find that current models demonstrate encouraging capabilities, while also revealing clear areas for improvement in cyclist-centric perception and reasoning, particularly in interpreting cyclist-specific traffic cues and associating signs with the correct navigational lanes. Notably, several driving-specialized models underperform strong generalist VLMs, indicating limited transfer from vehicle-centric training to cyclist-assistive scenarios. Finally, through systematic error analysis, we identify recurring failure modes to guide the development of more effective cyclist-assistive intelligent systems.

Problem

Research questions and friction points this paper is trying to address.

vision-language models

cyclist-assistive systems

spatial perception

traffic reasoning

autonomous driving

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language models

cyclist-centric perception

spatio-temporal reasoning