🤖 AI Summary
This work addresses the high latency in joint image reconstruction and object detection for infrared computational imaging, where existing acceleration methods struggle to balance accuracy and speed due to neglecting optical physical priors. To overcome this, we propose PDI-Net, a novel framework that, during inference, bypasses full reconstruction by directly sharing features from a lightweight half-U-Net encoder to a YOLO detector. We introduce a physics-aware large-to-small bridging module (PALS-Bridge) that adaptively modulates multi-scale convolutions using field-of-view-dependent point spread function priors. Furthermore, a physics-driven optical degradation simulation pipeline is developed for training. Evaluated on the low-SNR M3FD benchmark, PDI-Net reduces inference time by 84.06% and improves mAP@0.5:0.95 by 5.07% compared to a pruned Rec+Det baseline, while also cutting system weight by approximately 50%.
📝 Abstract
Computational imaging enables compact infrared systems, but deep-learning pipelines that combine image reconstruction and object detection often introduce substantial inference latency. Most existing acceleration strategies compress the reconstruction network while overlooking physical priors from the optical path, leaving a trade-off between accuracy and speed. We present Physics-aware Dual-Integrated Network (PDI-Net), a low-latency framework that integrates infrared reconstruction with object detection and further embeds optical priors into the learning process. PDI-Net uses a supervised U-Net during training, while a semi-U-Net encoder shares features directly with a YOLO-based detector during inference, avoiding full image reconstruction. To bridge the gap between fidelity-oriented reconstruction features and detection-oriented semantics, we introduce a physics-aware large-small bridge (PALS-Bridge), which uses field-dependent point spread function priors to adaptively modulate multiscale convolutional branches. A physics-informed optical degradation simulation pipeline is also developed for training and validation. The method is deployed on a single-lens infrared camera, reducing system weight by about 50% compared with traditional multi-lens designs. On the M3FD benchmark under low-SNR conditions, PDI-Net reduces inference time by 84.06% compared with the Rec+Det with pruning strategy while improving mAP@0.5:0.95 by 5.07%. These results demonstrate compact, low-latency computational infrared imaging for real-time object detection on resource-constrained platforms.