FOVI: A biologically-inspired foveated interface for deep vision models

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of conventional computer vision systems in processing high-resolution, wide-field-of-view imagery due to their uniform sampling strategy and lack of biologically inspired active perception mechanisms. To overcome this limitation, the authors propose the Foveal Vision Interface (FOVI), a bio-inspired framework that maps a variable-resolution sensor array—mimicking retinal structure—onto a uniform, dense V1-like manifold. FOVI integrates k-nearest neighbor (kNN)-based receptive field modeling with kNN convolution, enabling end-to-end training. It can be efficiently deployed via low-rank adaptation (LoRA) on existing Vision Transformers such as DINOv3. Experimental results demonstrate that FOVI significantly reduces computational cost while maintaining performance comparable to non-foveal baselines, offering a scalable and efficient solution for high-resolution egocentric vision tasks.

Technology Category

Application Category

📝 Abstract
Human vision is foveated, with variable resolution peaking at the center of a large field of view; this reflects an efficient trade-off for active sensing, allowing eye-movements to bring different parts of the world into focus with other parts of the world in context. In contrast, most computer vision systems encode the visual world at a uniform resolution, raising challenges for processing full-field high-resolution images efficiently. We propose a foveated vision interface (FOVI) based on the human retina and primary visual cortex, that reformats a variable-resolution retina-like sensor array into a uniformly dense, V1-like sensor manifold. Receptive fields are defined as k-nearest-neighborhoods (kNNs) on the sensor manifold, enabling kNN-convolution via a novel kernel mapping technique. We demonstrate two use cases: (1) an end-to-end kNN-convolutional architecture, and (2) a foveated adaptation of the foundational DINOv3 ViT model, leveraging low-rank adaptation (LoRA). These models provide competitive performance at a fraction of the computational cost of non-foveated baselines, opening pathways for efficient and scalable active sensing for high-resolution egocentric vision. Code and pre-trained models are available at https://github.com/nblauch/fovi and https://huggingface.co/fovi-pytorch.
Problem

Research questions and friction points this paper is trying to address.

foveated vision
variable resolution
efficient sensing
high-resolution imaging
computer vision
Innovation

Methods, ideas, or system contributions that make the work stand out.

foveated vision
kNN-convolution
biologically-inspired
low-rank adaptation
active sensing
🔎 Similar Papers
No similar papers found.