🤖 AI Summary
This work addresses the challenges of detail distortion and color inaccuracies in RGB-to-RAW reconstruction, which arise from the ill-posed nature of inverse ISP modeling and RGB quantization. To overcome these issues, the authors propose a generative reconstruction framework based on deterministic flow matching. The approach formulates the reconstruction process as vector field transport in a latent space and introduces two key innovations: a cross-scale context-guided module and a dual-domain latent autoencoder with feature alignment constraints. These components collectively enhance the stability and fidelity of the reconstruction. Experimental results demonstrate that the proposed method outperforms current state-of-the-art approaches in both quantitative metrics and visual quality, achieving significant improvements in structural detail preservation and color fidelity.
📝 Abstract
RGB-to-RAW reconstruction, or the reverse modeling of a camera Image Signal Processing (ISP) pipeline, aims to recover high-fidelity RAW data from RGB images. Despite notable progress, existing learning-based methods typically treat this task as a direct regression objective and struggle with detail inconsistency and color deviation, due to the ill-posed nature of inverse ISP and the inherent information loss in quantized RGB images. To address these limitations, we pioneer a generative perspective by reformulating RGB-to-RAW reconstruction as a deterministic latent transport problem and introduce a novel framework named RAW-Flow, which leverages flow matching to learn a deterministic vector field in latent space, to effectively bridge the gap between RGB and RAW representations and enable accurate reconstruction of structural details and color information. To further enhance latent transport, we introduce a cross-scale context guidance module that injects hierarchical RGB features into the flow estimation process. Moreover, we design a dual-domain latent autoencoder with a feature alignment constraint to support the proposed latent transport framework, which jointly encodes RGB and RAW inputs while promoting stable training and high-fidelity reconstruction. Extensive experiments demonstrate that RAW-Flow outperforms state-of-the-art approaches both quantitatively and visually.