APPLV: Adaptive Planner Parameter Learning from Vision-Language-Action Model

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the challenges of autonomous navigation for mobile robots in highly constrained environments, where traditional methods rely heavily on manual parameter tuning and end-to-end learning struggles to balance accuracy with generalization. The authors propose a novel paradigm that leverages a pre-trained vision-language model augmented with a regression head to adaptively predict parameters for a classical motion planner—rather than directly outputting control actions—and jointly fine-tunes the system via supervised and reinforcement learning. This approach represents the first integration of a vision-language-action framework for dynamically configuring planner parameters, significantly enhancing both control accuracy and cross-environment generalization while preserving safety. Experiments demonstrate consistent superiority over existing methods in both the BARN simulation benchmark and real-world robotic platforms.

Technology Category

Application Category

📝 Abstract

Autonomous navigation in highly constrained environments remains challenging for mobile robots. Classical navigation approaches offer safety assurances but require environment-specific parameter tuning; end-to-end learning bypasses parameter tuning but struggles with precise control in constrained spaces. To this end, recent robot learning approaches automate parameter tuning while retaining classical systems' safety, yet still face challenges in generalizing to unseen environments. Recently, Vision-Language-Action (VLA) models have shown promise by leveraging foundation models' scene understanding capabilities, but still struggle with precise control and inference latency in navigation tasks. In this paper, we propose Adaptive Planner Parameter Learning from Vision-Language-Action Model (\textsc{applv}). Unlike traditional VLA models that directly output actions, \textsc{applv} leverages pre-trained vision-language models with a regression head to predict planner parameters that configure classical planners. We develop two training strategies: supervised learning fine-tuning from collected navigation trajectories and reinforcement learning fine-tuning to further optimize navigation performance. We evaluate \textsc{applv} across multiple motion planners on the simulated Benchmark Autonomous Robot Navigation (BARN) dataset and in physical robot experiments. Results demonstrate that \textsc{applv} outperforms existing methods in both navigation performance and generalization to unseen environments.

Problem

Research questions and friction points this paper is trying to address.

autonomous navigation

constrained environments

parameter tuning

generalization

precise control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action model

adaptive planner parameter learning

classical motion planning