Enhancing Vision-Based Policies with Omni-View and Cross-Modality Knowledge Distillation for Mobile Robots

📅 2026-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of lightweight mobile robots in visual navigation—namely, poor generalization across environments, insufficient onboard computational resources, and high sensor costs. To overcome these challenges, we propose a novel approach that integrates omnidirectional depth perception with cross-modal knowledge distillation, transferring both action outputs and latent embeddings from an omnidirectional expert policy to a lightweight monocular policy. This strategy not only enhances the generalization and navigation performance of the monocular agent but also aligns with practical constraints for low-cost deployment. Real-world experiments demonstrate that the proposed method significantly improves cross-scene transferability, confirming its effectiveness and practical utility in resource-constrained robotic systems.

Technology Category

Application Category

📝 Abstract
Vision-based policies are widely applied in robotics for tasks such as manipulation and locomotion. On lightweight mobile robots, however, they face a trilemma of limited scene transferability, restricted onboard computation resources, and sensor hardware cost. To address these issues, we propose a knowledge distillation approach that transfers knowledge from an information-rich, appearance invariant omniview depth policy to a lightweight monocular policy. The key idea is to train the student not only to mimic the expert actions but also to align with the latent embeddings of the omni view depth teacher. Experiments demonstrate that omni-view and depth inputs improve the scene transfer and navigation performance, and that the proposed distillation method enhances the performance of a singleview monocular policy, compared with policies solely imitating actions. Real world experiments further validate the effectiveness and practicality of our approach. Code will be released publicly.
Problem

Research questions and friction points this paper is trying to address.

vision-based policies
scene transferability
onboard computation
sensor cost
mobile robots
Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge distillation
omni-view depth
vision-based policy
mobile robots
latent embedding alignment
🔎 Similar Papers
No similar papers found.
K
Kai Li
College of Computer Science and Technology at Zhejiang University, Hangzhou, China; School of Engineering at Westlake University, Hangzhou, China
Shiyu Zhao
Shiyu Zhao
Westlake University
Aerial manipulationMulti-robot systemsReinforcement learning