From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning?

πŸ“… 2026-02-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses a critical gap in the evaluation of vision-language models (VLMs) for autonomous driving, which has predominantly focused on vehicle-centric perspectives while neglecting the perceptual and decision-making needs of cyclists. To this end, we introduce CyclingVQA, the first diagnostic visual question answering benchmark tailored to the cyclist’s viewpoint, systematically evaluating over 31 VLMs across key capabilities including spatial perception, temporal reasoning, traffic rule comprehension, and lane-based inference. By integrating spatially enhanced architectures with lane-mapping techniques and conducting thorough error analysis, our work reveals significant deficiencies in current VLMs’ ability to interpret cyclist-specific signals and reason about navigational lane relationships. Notably, some specialized driving models exhibit weaker generalization than general-purpose counterparts, offering clear guidance for future development of cyclist-aware assistance systems.

Technology Category

Application Category

πŸ“ Abstract
Cyclists often encounter safety-critical situations in urban traffic, highlighting the need for assistive systems that support safe and informed decision-making. Recently, vision-language models (VLMs) have demonstrated strong performance on autonomous driving benchmarks, suggesting their potential for general traffic understanding and navigation-related reasoning. However, existing evaluations are predominantly vehicle-centric and fail to assess perception and reasoning from a cyclist-centric viewpoint. To address this gap, we introduce CyclingVQA, a diagnostic benchmark designed to probe perception, spatio-temporal understanding, and traffic-rule-to-lane reasoning from a cyclist's perspective. Evaluating 31+ recent VLMs spanning general-purpose, spatially enhanced, and autonomous-driving-specialized models, we find that current models demonstrate encouraging capabilities, while also revealing clear areas for improvement in cyclist-centric perception and reasoning, particularly in interpreting cyclist-specific traffic cues and associating signs with the correct navigational lanes. Notably, several driving-specialized models underperform strong generalist VLMs, indicating limited transfer from vehicle-centric training to cyclist-assistive scenarios. Finally, through systematic error analysis, we identify recurring failure modes to guide the development of more effective cyclist-assistive intelligent systems.
Problem

Research questions and friction points this paper is trying to address.

vision-language models
cyclist-assistive systems
spatial perception
traffic reasoning
autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language models
cyclist-centric perception
spatio-temporal reasoning
autonomous driving generalization
CyclingVQA benchmark
πŸ”Ž Similar Papers
No similar papers found.