Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

237K/year
🤖 AI Summary
This work proposes IMU-to-4D, a novel framework that overcomes the limitations of visual sensors in privacy, safety, power consumption, and scalability by introducing large language models to non-visual spatiotemporal perception for the first time. Leveraging only inertial measurement unit (IMU) signals from everyday wearable devices—such as earbuds, smartwatches, or smartphones—the method enables end-to-end joint reconstruction of four-dimensional human motion trajectories and coarse 3D scene layouts. Experiments across multiple diverse human-scene datasets demonstrate that IMU-to-4D substantially outperforms existing cascaded approaches, producing more temporally coherent and stable 4D reconstructions. These results establish that IMU signals alone are sufficient to support rich human-scene understanding without relying on visual input.

Technology Category

Application Category

📝 Abstract
Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts purely from everyday wearable sensors. For this we introduce IMU-to-4D, a framework that repurposes large language models for non-visual spatiotemporal understanding of human-scene dynamics. IMU-to-4D uses data from a few inertial sensors from earbuds, watches, or smartphones and predicts detailed 4D human motion together with coarse scene structure. Experiments across diverse human-scene datasets show that IMU-to-4D yields more coherent and temporally stable results than SoTA cascaded pipelines, suggesting wearable motion sensors alone can support rich 4D understanding.
Problem

Research questions and friction points this paper is trying to address.

4D perception
wearable IMUs
human-scene understanding
vision-free sensing
inertial sensors
Innovation

Methods, ideas, or system contributions that make the work stand out.

IMU-to-4D
wearable sensors
4D human-scene understanding
vision-free perception
large language models