🤖 AI Summary
Vision-and-language navigation (VLN) suffers from high computational overhead in large language model (LLM) inference, while existing token pruning methods neglect the adverse impact of increased path length on practical efficiency. Method: We propose Navigation-Aware Pruning (NAP), a fine-tuning-free approach that jointly prunes multimodal tokens using two criteria—navigational feasibility (based on viewpoint traversability) and instruction relevance (extracted via LLMs)—to preserve foreground (navigation-critical) regions, selectively prune background tokens, and suppress spurious backtracking nodes. NAP thus balances semantic fidelity with path optimality. Contribution/Results: Evaluated on standard VLN benchmarks, NAP improves task success rate while reducing FLOPS by over 50%, significantly outperforming prior pruning methods in both efficiency and effectiveness.
📝 Abstract
Large models achieve strong performance on Vision-and-Language Navigation (VLN) tasks, but are costly to run in resource-limited environments. Token pruning offers appealing tradeoffs for efficiency with minimal performance loss by reducing model input size, but prior work overlooks VLN-specific challenges. For example, information loss from pruning can effectively increase computational cost due to longer walks. Thus, the inability to identify uninformative tokens undermines the supposed efficiency gains from pruning. To address this, we propose Navigation-Aware Pruning (NAP), which uses navigation-specific traits to simplify the pruning process by pre-filtering tokens into foreground and background. For example, image views are filtered based on whether the agent can navigate in that direction. We also extract navigation-relevant instructions using a Large Language Model. After filtering, we focus pruning on background tokens, minimizing information loss. To further help avoid increases in navigation length, we discourage backtracking by removing low-importance navigation nodes. Experiments on standard VLN benchmarks show NAP significantly outperforms prior work, preserving higher success rates while saving more than 50% FLOPS.