KiVi: Kinesthetic-Visuospatial Integration for Dynamic and Safe Egocentric Legged Locomotion

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Vision-based locomotion for legged robots suffers from limited robustness due to occlusions, specular reflections, and illumination variations. To address this, we propose KiVi, a framework that decouples and synergistically integrates proprioceptive and visual pathways: proprioception serves as a stable backbone, while visual spatial reasoning is selectively fused via a memory-augmented attention mechanism—enhancing resilience to out-of-distribution noise and severe occlusions. KiVi unifies deep reinforcement learning, multimodal sensor fusion, and joint visuo-proprioceptive modeling. Experiments demonstrate that a quadrupedal robot achieves dynamic, stable walking across diverse, unstructured outdoor terrains. Compared to vision-only or standard sensor-fusion baselines, KiVi significantly improves real-world reliability and generalization capability under challenging perceptual conditions.

Technology Category

Application Category

📝 Abstract

Vision-based locomotion has shown great promise in enabling legged robots to perceive and adapt to complex environments. However, visual information is inherently fragile, being vulnerable to occlusions, reflections, and lighting changes, which often cause instability in locomotion. Inspired by animal sensorimotor integration, we propose KiVi, a Kinesthetic-Visuospatial integration framework, where kinesthetics encodes proprioceptive sensing of body motion and visuospatial reasoning captures visual perception of surrounding terrain. Specifically, KiVi separates these pathways, leveraging proprioception as a stable backbone while selectively incorporating vision for terrain awareness and obstacle avoidance. This modality-balanced, yet integrative design, combined with memory-enhanced attention, allows the robot to robustly interpret visual cues while maintaining fallback stability through proprioception. Extensive experiments show that our method enables quadruped robots to stably traverse diverse terrains and operate reliably in unstructured outdoor environments, remaining robust to out-of-distribution (OOD) visual noise and occlusion unseen during training, thereby highlighting its effectiveness and applicability to real-world legged locomotion.

Problem

Research questions and friction points this paper is trying to address.

Addresses vision fragility in legged robots due to occlusions and lighting changes

Integrates kinesthetic and visuospatial sensing for stable terrain navigation

Enables robust locomotion in unstructured environments with visual disturbances

Innovation

Methods, ideas, or system contributions that make the work stand out.

Separates kinesthetic and visuospatial sensory pathways

Uses proprioception as stable backbone for locomotion

Selectively incorporates vision with memory-enhanced attention

🔎 Similar Papers

Exosense: A Vision-Based Scene Understanding System For Exoskeletons