Can LVLMs and Automatic Metrics Capture Underlying Preferences of Blind and Low-Vision Individuals for Navigational Aid?

📅 2025-02-15

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This study investigates whether large vision-language models (LVLMs) and their standard automatic evaluation metrics accurately reflect the authentic preferences of blind and low-vision (BLV) users in navigation assistance. Method: We introduce Eye4B, the first BLV-oriented benchmark dataset comprising 1,100 real-world scenes, each paired with 5–10 navigation queries, and conduct multidimensional preference assessments—covering fear, operability, clarity, conciseness, and non-actionability—with eight BLV participants across six LVLMs (e.g., LLaVA, Qwen-VL). Contribution/Results: We find that conventional automatic metrics (e.g., CLIPScore, BLEU) exhibit significant misalignment with BLV preferences (confirmed via Spearman and Kendall correlation analyses). Crucially, conciseness and non-actionability emerge as dominant preference dimensions. This work delivers the first quantitative, multidimensional characterization of BLV preferences for navigation responses, bridging a critical gap in BLV-aligned evaluation and providing empirical foundations and design principles for accessible AI.

Technology Category

Application Category

📝 Abstract

Vision is a primary means of how humans perceive the environment, but Blind and Low-Vision (BLV) people need assistance understanding their surroundings, especially in unfamiliar environments. The emergence of semantic-based systems as assistance tools for BLV users has motivated many researchers to explore responses from Large Vision-Language Models (LVLMs). However, it has yet been studied preferences of BLV users on diverse types/styles of responses from LVLMs, specifically for navigational aid. To fill this gap, we first construct Eye4B dataset, consisting of human-validated 1.1k curated outdoor/indoor scenes with 5-10 relevant requests per scene. Then, we conduct an in-depth user study with eight BLV users to evaluate their preferences on six LVLMs from five perspectives: Afraidness, Nonactionability, Sufficiency, and Conciseness. Finally, we introduce Eye4B benchmark for evaluating alignment between widely used model-based image-text metrics and our collected BLV preferences. Our work can be set as a guideline for developing BLV-aware LVLMs towards a Barrier-Free AI system.

Problem

Research questions and friction points this paper is trying to address.

Assessing LVLMs for BLV navigational aid

Evaluating BLV preferences in LVLM responses

Creating benchmark for BLV-aligned AI metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs Eye4B dataset

Evaluates BLV preferences on LVLMs

Introduces Eye4B benchmark

🔎 Similar Papers

No similar papers found.