KiVi: Kinesthetic-Visuospatial Integration for Dynamic and Safe Egocentric Legged Locomotion

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision-based locomotion for legged robots suffers from limited robustness due to occlusions, specular reflections, and illumination variations. To address this, we propose KiVi, a framework that decouples and synergistically integrates proprioceptive and visual pathways: proprioception serves as a stable backbone, while visual spatial reasoning is selectively fused via a memory-augmented attention mechanism—enhancing resilience to out-of-distribution noise and severe occlusions. KiVi unifies deep reinforcement learning, multimodal sensor fusion, and joint visuo-proprioceptive modeling. Experiments demonstrate that a quadrupedal robot achieves dynamic, stable walking across diverse, unstructured outdoor terrains. Compared to vision-only or standard sensor-fusion baselines, KiVi significantly improves real-world reliability and generalization capability under challenging perceptual conditions.

Technology Category

Application Category

📝 Abstract
Vision-based locomotion has shown great promise in enabling legged robots to perceive and adapt to complex environments. However, visual information is inherently fragile, being vulnerable to occlusions, reflections, and lighting changes, which often cause instability in locomotion. Inspired by animal sensorimotor integration, we propose KiVi, a Kinesthetic-Visuospatial integration framework, where kinesthetics encodes proprioceptive sensing of body motion and visuospatial reasoning captures visual perception of surrounding terrain. Specifically, KiVi separates these pathways, leveraging proprioception as a stable backbone while selectively incorporating vision for terrain awareness and obstacle avoidance. This modality-balanced, yet integrative design, combined with memory-enhanced attention, allows the robot to robustly interpret visual cues while maintaining fallback stability through proprioception. Extensive experiments show that our method enables quadruped robots to stably traverse diverse terrains and operate reliably in unstructured outdoor environments, remaining robust to out-of-distribution (OOD) visual noise and occlusion unseen during training, thereby highlighting its effectiveness and applicability to real-world legged locomotion.
Problem

Research questions and friction points this paper is trying to address.

Addresses vision fragility in legged robots due to occlusions and lighting changes
Integrates kinesthetic and visuospatial sensing for stable terrain navigation
Enables robust locomotion in unstructured environments with visual disturbances
Innovation

Methods, ideas, or system contributions that make the work stand out.

Separates kinesthetic and visuospatial sensory pathways
Uses proprioception as stable backbone for locomotion
Selectively incorporates vision with memory-enhanced attention
🔎 Similar Papers
No similar papers found.
Peizhuo Li
Peizhuo Li
ETH Zurich
Character AnimationDeep Learning
H
Hongyi Li
MARMoT Lab, National University of Singapore, Singapore; Center for X-mechanics, Zhejiang University, China; Robotics & Machine Intelligence Lab, Zhejiang University, China
Yuxuan Ma
Yuxuan Ma
MARMoT Lab, National University of Singapore, Singapore
L
Linnan Chang
MARMoT Lab, National University of Singapore, Singapore
X
Xinrong Yang
MARMoT Lab, National University of Singapore, Singapore
R
Ruiqi Yu
Robotics & Machine Intelligence Lab, Zhejiang University, China
Y
Yifeng Zhang
MARMoT Lab, National University of Singapore, Singapore
Yuhong Cao
Yuhong Cao
National University of Singapore
Robot learningPath Planing
Q
Qiuguo Zhu
Robotics & Machine Intelligence Lab, Zhejiang University, China
Guillaume Sartoretti
Guillaume Sartoretti
Assistant Professor, National University of Singapore (NUS), Mechanical Engineering Dpt
Multi-Agent SystemsRoboticsSwarm IntelligenceDistributed ControlDistributed Learning