🤖 AI Summary
This work proposes UniISP, the first end-to-end image signal processing (ISP) framework that unifies human visual perception with the requirements of downstream machine vision tasks. Traditional ISP pipelines produce RGB images aligned with human aesthetic preferences but often discard information critical for machine vision, whereas raw-data-pass-through approaches fail to meet human visual expectations. To address this trade-off, UniISP introduces a Hybrid Attention Module (HAM) trained under supervised learning to generate images that simultaneously achieve high perceptual quality and preserve task-relevant information. Additionally, it incorporates feature adapters to efficiently transfer useful representations from the ISP stage to downstream task networks. Extensive experiments across multiple datasets and scenarios demonstrate that UniISP consistently enhances both image aesthetics and task performance, confirming its strong generalization capability and effectiveness.
📝 Abstract
Compared to RGB images, raw sensor data provides a richer representation of information, which is crucial for accurate recognition, particularly under challenging conditions such as low-light environments. The traditional Image Signal Processing (ISP) pipeline generates visually pleasing RGB images for human perception through a series of steps, but some of these operations may adversely impact the information integrity by introducing compression and loss. Furthermore, in computer vision tasks that directly utilize raw camera data, most existing methods integrate minimal ISP processing with downstream networks, yet the resulting images are often difficult to visualize or do not align with human aesthetic preferences. This paper proposes UniISP, a novel ISP framework designed to simultaneously meet the requirements of both human visual perception and computer vision applications. By incorporating a carefully designed Hybrid Attention Module (HAM) and employing supervised learning, the proposed method ensures that the generated images are visually appealing. Additionally, a Feature Adapter module is introduced to effectively propagate informative features from the ISP stage to subsequent downstream networks. Extensive experiments demonstrate that our approach achieves state-of-the-art performance across various scenarios and multiple datasets, proving its generalizability and effectiveness.