FOVI: A biologically-inspired foveated interface for deep vision models

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the inefficiency of conventional computer vision systems in processing high-resolution, wide-field-of-view imagery due to their uniform sampling strategy and lack of biologically inspired active perception mechanisms. To overcome this limitation, the authors propose the Foveal Vision Interface (FOVI), a bio-inspired framework that maps a variable-resolution sensor array—mimicking retinal structure—onto a uniform, dense V1-like manifold. FOVI integrates k-nearest neighbor (kNN)-based receptive field modeling with kNN convolution, enabling end-to-end training. It can be efficiently deployed via low-rank adaptation (LoRA) on existing Vision Transformers such as DINOv3. Experimental results demonstrate that FOVI significantly reduces computational cost while maintaining performance comparable to non-foveal baselines, offering a scalable and efficient solution for high-resolution egocentric vision tasks.

Technology Category

Application Category

📝 Abstract

Human vision is foveated, with variable resolution peaking at the center of a large field of view; this reflects an efficient trade-off for active sensing, allowing eye-movements to bring different parts of the world into focus with other parts of the world in context. In contrast, most computer vision systems encode the visual world at a uniform resolution, raising challenges for processing full-field high-resolution images efficiently. We propose a foveated vision interface (FOVI) based on the human retina and primary visual cortex, that reformats a variable-resolution retina-like sensor array into a uniformly dense, V1-like sensor manifold. Receptive fields are defined as k-nearest-neighborhoods (kNNs) on the sensor manifold, enabling kNN-convolution via a novel kernel mapping technique. We demonstrate two use cases: (1) an end-to-end kNN-convolutional architecture, and (2) a foveated adaptation of the foundational DINOv3 ViT model, leveraging low-rank adaptation (LoRA). These models provide competitive performance at a fraction of the computational cost of non-foveated baselines, opening pathways for efficient and scalable active sensing for high-resolution egocentric vision. Code and pre-trained models are available at https://github.com/nblauch/fovi and https://huggingface.co/fovi-pytorch.

Problem

Research questions and friction points this paper is trying to address.

foveated vision

variable resolution

efficient sensing

high-resolution imaging

computer vision

Innovation

Methods, ideas, or system contributions that make the work stand out.

foveated vision

kNN-convolution

biologically-inspired