Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large vision-language models (VLMs) face prohibitive computational and memory overhead, hindering their deployment on mobile devices for blind and low-vision (BLV) users. Method: This work systematically evaluates lightweight VLMs—specifically the SmolVLM2 series—for fine-grained, context-aware video captioning in both indoor and outdoor settings. We introduce two novel accessibility-oriented evaluation frameworks: a multi-context BLV framework and a navigation-assistance framework, and investigate the impact of prompt engineering on caption quality. Experiments are conducted on smartphones using FP32 and INT8 quantized inference, validated on AVCaps and Charades datasets. Contribution/Results: Lightweight VLMs achieve efficient, high-quality, task-adapted caption generation on resource-constrained mobile hardware. Our evaluation frameworks enable rigorous, scenario-specific assessment aligned with real-world BLV needs. Results demonstrate significant improvements in practicality and deployability of vision-assistance technologies, advancing accessible AI for mobile platforms.

Technology Category

Application Category

📝 Abstract
Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for blind and low-vision (BLV) users who depend on detailed, context-aware descriptions. To study the effect of model size on accessibility-focused description quality, we evaluate SmolVLM2 variants with 500M and 2.2B parameters across two diverse datasets: AVCaps (outdoor), and Charades (indoor). In this work, we introduce two novel evaluation frameworks specifically designed for BLV accessibility assessment: the Multi-Context BLV Framework evaluating spatial orientation, social interaction, action events, and ambience contexts; and the Navigational Assistance Framework focusing on mobility-critical information. Additionally, we conduct a systematic evaluation of four different prompt design strategies and deploy both models on a smartphone, evaluating FP32 and INT8 precision variants to assess real-world performance constraints on resource-limited mobile devices.
Problem

Research questions and friction points this paper is trying to address.

Evaluating lightweight VLMs for blind users' accessibility with detailed descriptions
Developing novel evaluation frameworks for BLV spatial and navigational assistance
Assessing mobile deployment constraints of VLMs on resource-limited devices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated lightweight VLM variants with 500M and 2.2B parameters
Introduced two novel BLV accessibility-focused evaluation frameworks
Deployed models on smartphones with different precision variants
🔎 Similar Papers
S
S. S. Baghel
Indian Institute of Technology Mandi
Y
Yash Pratap Singh Rathore
Indian Institute of Technology Mandi
S
Sushovan Jena
Indian Institute of Technology Mandi
A
Anurag Pradhan
Vellore Institute of Technology
Amit Shukla
Amit Shukla
Chairperson, Center for Artificial Intelligence and Robotics CAIR@IIT Mandi,Founder of Simahtel
RoboticsArtificial IntelligenceCyber SecurityMechatronicsElectric Vehicle
A
Arnav Bhavsar
Indian Institute of Technology Mandi
P
Pawan Goyal
Indian Institute of Technology Kharagpur