🤖 AI Summary
This study addresses the challenge faced by blind or low-vision individuals in accessing critical visual information—such as color and text—through touch, which hinders their comprehensive understanding of objects. To bridge this gap, the authors propose a first-person-view hand-object interaction recognition model that, for the first time, integrates visually impaired users’ characteristic exploratory gestures with a context-aware mechanism to generate responsive and adaptive real-time visual descriptions. The system dynamically delivers relevant labels and fine-grained details based on the user’s interaction state with the object. User studies demonstrate that this approach significantly enhances object comprehension and manipulation capabilities during non-visual hand-object interactions, thereby validating its effectiveness and practical utility.
📝 Abstract
People who are blind or have low vision regularly use their hands to interact with the physical world to gain access to objects'shape, size, weight, and texture. However, many rich visual features remain inaccessible through touch alone, making it difficult to distinguish similar objects, interpret visual affordances, and form a complete understanding of objects. In this work, we present TouchScribe, a system that augments hand-object interactions with automated live visual descriptions. We trained a custom egocentric hand interaction model to recognize both common gestures (e.g., grab to inspect, hold side-by-side to compare) and unique ones by blind people (e.g., point to explore color, or swipe to read available texts). Furthermore, TouchScribe provides real-time and adaptive feedback based on hand movement, from hand interaction states, to object labels, and to visual details. Our user study and technical evaluations demonstrate that TouchScribe can provide rich and useful descriptions to support object understanding. Finally, we discuss the implications of making live visual descriptions responsive to users'physical reach.