๐ค AI Summary
To address the high computational cost and lack of interpretability in Transformer-based decoder models for visual navigation, this paper proposes a Dynamic Feature and Layer Selection (DFLS) framework. The framework jointly integrates trainable hard feature selection, dynamic layer skipping, and early-exit mechanisms, with Bayesian optimization automatically learning optimal exit thresholds to enable scene-adaptive computation path scheduling. Evaluated on four public benchmarks, DFLS reduces FLOPs by 2.26ร, inference latency by 42.3%, and memory footprint by 32.8% compared to ViNT, while improving navigation success rate. Its core innovation lies in the synergistic modeling of sparse feature selection and hierarchical dynamic early exitingโthereby simultaneously enhancing computational efficiency, decision interpretability, and end-to-end navigation performance.
๐ Abstract
Visual navigation is essential for robotics and embodied AI. However, existing foundation models, particularly those with transformer decoders, suffer from high computational overhead and lack interpretability, limiting their deployment in resource-tight scenarios. To address this, we propose DynaNav, a Dynamic Visual Navigation framework that adapts feature and layer selection based on scene complexity. It employs a trainable hard feature selector for sparse operations, enhancing efficiency and interpretability. Additionally, we integrate feature selection into an early-exit mechanism, with Bayesian Optimization determining optimal exit thresholds to reduce computational cost. Extensive experiments in real-world-based datasets and simulated environments demonstrate the effectiveness of DynaNav. Compared to ViNT, DynaNav achieves a 2.26x reduction in FLOPs, 42.3% lower inference time, and 32.8% lower memory usage, while improving navigation performance across four public datasets.