Receptive Field Expanded Look-Up Tables for Vision Inference: Advancing from Low-level to High-level Tasks

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing LUT-based acceleration methods suffer from narrow effective receptive fields—constrained by combinatorial explosion—thus failing to balance accuracy and efficiency. To address this, we propose a novel LUT-driven CNN inference framework. First, we introduce an adaptive resolution allocation scheme based on optimal lattice vector quantization, replacing scalar quantization to improve kernel function approximation fidelity. Second, we incorporate irregular dilated convolutions and a U-shaped cascaded LUT architecture, significantly expanding the effective receptive field under fixed lookup-table space complexity. Third, we enable multi-level contextual modeling and efficient nonlinearity approximation. Experiments demonstrate that our method substantially outperforms prior LUT-based approaches under comparable memory budgets, achieving simultaneous improvements in both inference speed and model accuracy on high-level vision tasks such as ImageNet.

Technology Category

Application Category

📝 Abstract

Recently, several look-up table (LUT) methods were developed to greatly expedite the inference of CNNs in a classical strategy of trading space for speed. However, these LUT methods suffer from a common drawback of limited receptive field of the convolution kernels due to the combinatorial explosion of table size. This research aims to expand the CNN receptive field with a fixed table size, thereby enhancing the performance of LUT-driven fast CNN inference while maintaining the same space complexity. To achieve this goal, various techniques are proposed. The main contribution is a novel approach of learning an optimal lattice vector quantizer that adaptively allocates the quantization resolution across data dimensions based on their significance to the inference task. In addition, the lattice vector quantizer offers an inherently more accurate approximation of CNN kernels than scalar quantizer as used in current practice. Furthermore, we introduce other receptive field expansion strategies, including irregular dilated convolutions and a U-shaped cascaded LUT structure, designed to capture multi-level contextual information without inflating table size. Together, these innovations allow our approach to effectively balance speed, accuracy, and memory efficiency, demonstrating significant improvements over existing LUT methods.

Problem

Research questions and friction points this paper is trying to address.

Expanding CNN receptive field with fixed table size

Enhancing LUT-driven inference while maintaining space complexity

Balancing speed, accuracy and memory in vision tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning optimal lattice vector quantizer for adaptive resolution

Introducing irregular dilated convolutions to expand receptive field

Using U-shaped cascaded LUT structure for multi-level context

🔎 Similar Papers

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers