FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the limitations of existing token caching methods in vision-and-language navigation (VLN), which struggle to balance efficiency and performance due to viewpoint variations, loss of critical edge information, and rigid cache budgets. To overcome these challenges, we introduce frequency-domain analysis into VLN token caching for the first time, leveraging its viewpoint invariance and structural interpretability. We propose a training-free, adaptive caching framework that dynamically optimizes cache construction, refreshment, and budget allocation. This approach effectively preserves essential visual edge features and mitigates viewpoint shift effects, achieving a 1.59× inference speedup with negligible computational overhead, thereby significantly enhancing the inference efficiency of VLN models.

Technology Category

Application Category

📝 Abstract

Vision-Language-Navigation (VLN) models exhibit excellent navigation accuracy but incur high computational overhead. Token caching has emerged as a promising training-free strategy to reduce this cost by reusing token computation results; however, existing token caching approaches rely on visual domain methods for cacheable token selection, leading to challenges when adapted to VLN models. 1) Visual domain methods become invalid when there is viewpoint migration. 2) Visual domain methods neglect critical edge information without the aid of additional algorithms. 3) Visual domain methods overlook the temporal variation of scenarios and lack adjustability in cache budgets. In this paper, we develop detailed analyses and find that the impacts of these challenges exhibit invariance and analyzability in the frequency domain. Based on these, we propose a frequency-guided token caching framework, called FreqCache. Utilizing the inherent properties of the frequency domain, FreqCache achieves optimal token cache establishment, refreshment, and adaptive adjustment. Experiments show that FreqCache achieves 1.59x speedup with ignorable overhead, showing the effect of integrating frequency domain methods in VLN token caching.

Problem

Research questions and friction points this paper is trying to address.

token caching

Vision-Language Navigation

frequency domain

computational overhead

viewpoint migration

Innovation

Methods, ideas, or system contributions that make the work stand out.

frequency domain

token caching

embodied navigation