Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes

📅 2026-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing scene-awareness tools for blind and low-vision users, which predominantly rely on verbal descriptions and struggle to convey the aesthetic qualities and spatial depth of vista scenes. To overcome this, the authors propose Scene2Audio, a novel framework that—integrating generative artificial intelligence, psychoacoustic principles, and scenic audio composition techniques—translates visual landscapes into immersive, non-verbal auditory renderings that complement spoken descriptions. User studies demonstrate that this approach significantly enhances participants’ ability to mentally reconstruct distant scenes. Furthermore, multi-day real-world field evaluations confirm its practicality and experiential benefits in mobile outdoor settings, offering visually impaired individuals a new auditory perception paradigm that balances intelligibility with aesthetic richness.
📝 Abstract
Current scene perception tools for Blind and Low Vision (BLV) individuals rely on spoken descriptions but lack engaging representations of visually pleasing distant environmental landscapes (Vista spaces). Our proposed Scene2Audio framework generates comprehensible and enjoyable nonverbal audio using generative models informed by psychoacoustics, and principles of scene audio composition. Through a user study with 11 BLV participants, we found that combining the Scene2Audio sounds with speech creates a better experience than speech alone, as the sound effects complement the speech making the scene easier to imagine. A mobile app "in-the-wild" study with 7 BLV users for more than a week further showed the potential of Scene2Audio in enhancing outdoor scene experiences. Our work bridges the gap between visual and auditory scene perception by moving beyond purely descriptive aids, addressing the aesthetic needs of BLV users.
Problem

Research questions and friction points this paper is trying to address.

Blind and Low Vision
Vista landscapes
scene perception
auditory representation
aesthetic experience
Innovation

Methods, ideas, or system contributions that make the work stand out.

generative audio
psychoacoustics
scene perception
blind and low-vision assistance
multimodal interaction
🔎 Similar Papers
Chitralekha Gupta
Chitralekha Gupta
Senior Research Fellow at National University of Singapore
Music Information RetrievalAudio Signal ProcessingMachine LearningDeep Learning
J
Jing Peng
Augmented Human Lab, National University of Singapore
Ashwin Ram
Ashwin Ram
Postdoctoral Researcher, Universität des Saarlandes
Human-Computer InteractionWearable ComputingHuman-AI Interaction
S
Shreyas Sridhar
Augmented Human Lab, National University of Singapore
C
Christophe Jouffrais
IRIT, CNRS, Toulouse, France; IPAL, CNRS, Singapore
S
Suranga Nanayakkara
Augmented Human Lab, Dept. of Computer Science, National University of Singapore