Learning to Evolve: Multi-modal Interactive Fields for Robust Humanoid Navigation in Dynamic Environments

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

229K/year
🤖 AI Summary
This work addresses the challenge of unreliable scene memory in humanoid robot navigation within dynamic environments, where gait-induced disturbances, environmental changes, and interaction safety constraints degrade localization and mapping performance. To overcome this, the authors propose a Multimodal Interaction Field (MIF) architecture that integrates confidence-aware semantic 3D Gaussian splatting, a topology-preserving spatial field, and an interaction-safety-aware geometric field. A discrepancy detection mechanism is introduced to effectively distinguish gait-induced artifacts from genuine environmental changes, enabling efficient updates of locally inconsistent regions. Evaluated in real-world dynamic office settings, the system improves relocalization success rate from 12% to 94% while reducing semantic memory consumption by 91.4%, substantially enhancing navigation robustness and online computational efficiency in dynamic scenarios.
📝 Abstract
Safe manipulation-oriented navigation for humanoid robots requires scene memory that remains reliable under locomotion-induced perceptual distortion, environmental changes, and interaction-level geometric safety constraints. Existing semantic mapping and scene-graph systems are difficult to deploy directly in this setting because they often assume stable camera trajectories, static environments, or coarse object geometry. We introduce the Multi-modal Interactive Field (MIF), a humanoid-oriented system that integrates confidence-aware semantic 3D Gaussian Splatting, discrepancy-triggered spatial memory updates, and task-driven geometric reconstruction within a closed-loop perception-adaptation pipeline. MIF couples three fields: an uncertainty-aware 3DGS Appearance Field that suppresses gait-induced blur, a Spatial Field that maintains topological memory, and a Geometry Field that supports Interaction Pose Safety (IPS) before manipulation. A discrepancy detection score is introduced to separate locomotion-induced false-positive changes from persistent changes and updates only locally inconsistent regions. On a Unitree-G1 humanoid in a real dynamic office, MIF improves relocation success in non-static environments from 12% to 94% compared with static scene-graph memory, while reducing semantic memory footprint by 91.4% through feature distillation for practical online operation. Project page and code: https://ziya-jiang.github.io/MIF-homepage/
Problem

Research questions and friction points this paper is trying to address.

humanoid navigation
dynamic environments
scene memory
perceptual distortion
geometric safety constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal Interactive Field
3D Gaussian Splatting
discrepancy detection
Interaction Pose Safety
semantic memory distillation
🔎 Similar Papers
No similar papers found.