3D Reconstruction from Transient Measurements with Time-Resolved Transformer

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

To address the challenges of low signal-to-noise ratio (SNR) in transient measurements and poor sensing efficiency in photon-efficient imaging—leading to degraded line-of-sight (LOS) and non-line-of-sight (NLOS) 3D reconstruction—this paper introduces the first time-resolved Transformer architecture specifically designed for transient data. Our method features a spatiotemporal self-attention encoder and a cross-attention decoder, which efficiently fuse multi-scale spatiotemporal features in token space via feature patching and hierarchical downsampling, jointly modeling both local details and global structural dependencies. Evaluated on both synthetic and real-world datasets, our approach significantly outperforms existing methods. We further release two novel benchmarks: a large-scale noisy LOS simulation dataset and the first publicly available real-world NLOS measurement dataset. All code and data are open-sourced. This work establishes a general, robust paradigm for low-light transient imaging reconstruction.

Technology Category

Application Category

📝 Abstract

Transient measurements, captured by the timeresolved systems, are widely employed in photon-efficient reconstruction tasks, including line-of-sight (LOS) and non-line-of-sight (NLOS) imaging. However, challenges persist in their 3D reconstruction due to the low quantum efficiency of sensors and the high noise levels, particularly for long-range or complex scenes. To boost the 3D reconstruction performance in photon-efficient imaging, we propose a generic Time-Resolved Transformer (TRT) architecture. Different from existing transformers designed for high-dimensional data, TRT has two elaborate attention designs tailored for the spatio-temporal transient measurements. Specifically, the spatio-temporal self-attention encoders explore both local and global correlations within transient data by splitting or downsampling input features into different scales. Then, the spatio-temporal cross attention decoders integrate the local and global features in the token space, resulting in deep features with high representation capabilities. Building on TRT, we develop two task-specific embodiments: TRT-LOS for LOS imaging and TRT-NLOS for NLOS imaging. Extensive experiments demonstrate that both embodiments significantly outperform existing methods on synthetic data and real-world data captured by different imaging systems. In addition, we contribute a large-scale, high-resolution synthetic LOS dataset with various noise levels and capture a set of real-world NLOS measurements using a custom-built imaging system, enhancing the data diversity in this field. Code and datasets are available at https://github.com/Depth2World/TRT.

Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D reconstruction from transient measurements in photon-efficient imaging

Addressing sensor noise and low quantum efficiency in complex scenes

Developing transformer architecture for spatio-temporal transient data processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Time-Resolved Transformer for transient spatio-temporal data

Spatio-temporal attention encoders explore local and global correlations

Cross attention decoders integrate multi-scale features for reconstruction

🔎 Similar Papers

AdaptiveFusion: Adaptive Multi-Modal Multi-View Fusion for 3D Human Body Reconstruction