🤖 AI Summary
This work addresses the challenge of jointly optimizing camera hardware parameters (e.g., exposure time) and adaptive neural control algorithms—a task hindered by mixed continuous/discrete variables, non-differentiable imaging pipelines, and incompatible optimization paradigms. We propose DF-Grad, a hybrid differentiable–derivative-free optimization framework that enables, for the first time, end-to-end joint training of hardware configurations and runtime neural controllers. DF-Grad unifies gradient-based optimization and derivative-free search while incorporating a task-driven unsupervised learning mechanism to bridge perception and acquisition. Evaluated under low-light and high-speed motion conditions, our method significantly outperforms conventional stepwise optimization baselines on downstream vision tasks—including object detection and semantic segmentation—demonstrating the effectiveness and advancement of deep co-design between sensing and acquisition.
📝 Abstract
The quality of captured images strongly influences the performance of downstream perception tasks. Recent works on co-designing camera systems with perception tasks have shown improved task performance. However, most prior approaches focus on optimising fixed camera parameters set at manufacturing, while many parameters, such as exposure settings, require adaptive control at runtime. This paper introduces a method that jointly optimises camera hardware and adaptive camera control algorithms with downstream vision tasks. We present a unified optimisation framework that integrates gradient-based and derivative-free methods, enabling support for both continuous and discrete parameters, non-differentiable image formation processes, and neural network-based adaptive control algorithms. To address non-differentiable effects such as motion blur, we propose DF-Grad, a hybrid optimisation strategy that trains adaptive control networks using signals from a derivative-free optimiser alongside unsupervised task-driven learning. Experiments show that our method outperforms baselines that optimise static and dynamic parameters separately, particularly under challenging conditions such as low light and fast motion. These results demonstrate that jointly optimising hardware parameters and adaptive control algorithms improves perception performance and provides a unified approach to task-driven camera system design.