FSL-LVLM: Friction-Aware Safety Locomotion using Large Vision Language Model in Wheeled Robots

📅 2024-09-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Legged-wheel robots suffer from instability on slippery terrain due to insufficient traction, necessitating real-time friction estimation for feedforward motion control. To address this, we propose the first friction-aware framework integrating Large Vision-Language Models (LVLMs) with Reinforcement Learning (RL). Specifically, we design a Friction-From-Vision module that leverages LVLMs’ zero-shot and few-shot capabilities to directly estimate ground friction coefficients from single RGB images. Crucially, we explicitly embed the estimated friction coefficient into the policy networks of PPO and SAC—enabling terrain-adaptive control without modifying reward structures or dynamics models. Evaluated on a wheeled inverted pendulum platform, our method significantly improves task success rate and reduces trajectory tracking error by 32% compared to baseline RL policies. Moreover, the framework exhibits plug-and-play compatibility, seamlessly integrating with diverse on-policy and off-policy RL algorithms without architectural retraining.

Technology Category

Application Category

📝 Abstract

Wheeled-legged robots offer significant mobility and versatility but face substantial challenges when operating on slippery terrains. Traditional model-based controllers for these robots assume no slipping. While reinforcement learning (RL) helps quadruped robots adapt to different surfaces, recovering from slips remains challenging, especially for systems with few contact points. Estimating the ground friction coefficient is another open challenge. In this paper, we propose a novel friction-aware safety locomotion framework that integrates Large Vision Language Models (LVLMs) with a RL policy. Our approach explicitly incorporates the estimated friction coefficient into the RL policy, enabling the robot to adapt its behavior in advance based on the surface type before reaching it. We introduce a Friction-From-Vision (FFV) module, which leverages LVLMs to estimate ground friction coefficients, eliminating the need for large datasets and extensive training. The framework was validated on a customized wheeled inverted pendulum, and experimental results demonstrate that our framework increases the success rate in completing driving tasks by adjusting speed according to terrain type, while achieving better tracking performance compared to baseline methods. Our framework can be simply integrated with any other RL policies.

Problem

Research questions and friction points this paper is trying to address.

Control wheeled-legged robots on slippery surfaces

Predict ground friction before contact for stability

Integrate vision-language models with reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

VLMs estimate friction for proactive control

RL policy adapts speed using friction data

RAG enhances CoF prediction accuracy

🔎 Similar Papers

On the Vulnerability of LLM/VLM-Controlled Robotics