VueBuds: Visual Intelligence with Wireless Earbuds

πŸ“… 2026-03-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the longstanding absence of visual perception in wireless earbuds, constrained by size and power limitations. For the first time, the authors integrate miniature binocular cameras into mainstream wireless earbuds, achieving effective forward-facing field-of-view coverage under stringent power (<5β€―mW) and form-factor constraints. Low-power visual data is transmitted via Bluetooth to a host device, where an on-device vision-language model (VLM) enables real-time scene understanding, translation, text recognition, and visual reasoning. In a user study with 90 participants across 17 visual question-answering tasks, the system’s response quality matched that of Ray-Ban Meta smart glasses, demonstrating the feasibility of the earbud form factor as a high-quality platform for visual intelligence in wearable computing.
πŸ“ Abstract
Despite their ubiquity, wireless earbuds remain audio-centric due to size and power constraints. We present VueBuds, the first camera-integrated wireless earbuds for egocentric vision, capable of operating within stringent power and form-factor limits. Each VueBud embeds a camera into a Sony WF-1000XM3 to stream visual data over Bluetooth to a host device for on-device vision language model (VLM) processing. We show analytically and empirically that while each camera's field of view is partially occluded by the face, the combined binocular perspective provides comprehensive forward coverage. By integrating VueBuds with VLMs, we build an end-to-end system for real-time scene understanding, translation, visual reasoning, and text reading; all from low-resolution monochrome cameras drawing under 5mW through on-demand activation. Through online and in-person user studies with 90 participants, we compare VueBuds against smart glasses across 17 visual question-answering tasks, and show that our system achieves response quality on par with Ray-Ban Meta. Our work establishes low-power camera-equipped earbuds as a compelling platform for visual intelligence, bringing rapidly advancing VLM capabilities to one of the most ubiquitous wearable form factors.
Problem

Research questions and friction points this paper is trying to address.

visual intelligence
wireless earbuds
egocentric vision
low-power cameras
vision language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

camera-integrated earbuds
egocentric vision
vision language model (VLM)
low-power wearable
binocular perspective
πŸ”Ž Similar Papers
No similar papers found.
Maruchi Kim
Maruchi Kim
Phd Student, University of Washington
R
Rasya Fawwaz
Electrical & Computer Engineering, University of Washington, WA, USA
Zhi Yang Lim
Zhi Yang Lim
Unknown affiliation
B
Brinda Moudgalya
Electrical & Computer Engineering, University of Washington, WA, USA
H
Hexi Wang
Electrical & Computer Engineering, University of Washington, WA, USA
Y
Yuanhao Zeng
Electrical & Computer Engineering, University of Washington, WA, USA
S
Shyamnath Gollakota
Paul G. Allen School, University of Washington, WA, USA