A Hybrid Autoencoder for Robust Heightmap Generation from Fused Lidar and Depth Data for Humanoid Robot Locomotion

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the limitations of humanoid robots in unstructured, human-centric environments due to insufficient terrain perception. The authors propose a learning-based multimodal fusion framework that integrates LiDAR, depth camera, and IMU data to generate robot-centric elevation maps. A hybrid encoder-decoder network combining CNNs and GRUs is designed to jointly optimize spatial feature extraction and temporal consistency modeling. Spherical projection is employed to process data from the LIVOX MID-360 LiDAR and Intel RealSense depth sensor. Experimental results demonstrate that the proposed method improves reconstruction accuracy by 7.2% and 9.9% compared to using depth or LiDAR data alone, respectively, and effectively suppresses mapping drift by leveraging a 3.2-second temporal context.

Technology Category

Application Category

📝 Abstract

Reliable terrain perception is a critical prerequisite for the deployment of humanoid robots in unstructured, human-centric environments. While traditional systems often rely on manually engineered, single-sensor pipelines, this paper presents a learning-based framework that uses an intermediate, robot-centric heightmap representation. A hybrid Encoder-Decoder Structure (EDS) is introduced, utilizing a Convolutional Neural Network (CNN) for spatial feature extraction fused with a Gated Recurrent Unit (GRU) core for temporal consistency. The architecture integrates multimodal data from an Intel RealSense depth camera, a LIVOX MID-360 LiDAR processed via efficient spherical projection, and an onboard IMU. Quantitative results demonstrate that multimodal fusion improves reconstruction accuracy by 7.2% over depth-only and 9.9% over LiDAR-only configurations. Furthermore, the integration of a 3.2 s temporal context reduces mapping drift.

Problem

Research questions and friction points this paper is trying to address.

terrain perception

humanoid robot locomotion

multimodal sensor fusion

heightmap generation

unstructured environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid autoencoder

multimodal fusion

heightmap generation