🤖 AI Summary
To address the concurrent physical distortion and semantic degradation in underwater image enhancement, this paper proposes a physics-constrained dual-stream capsule variational autoencoder. Methodologically, we first embed the Jaffe–McGlamery underwater imaging model into the network architecture to construct a physics-guided capsule clustering module, enabling joint estimation of spatially varying transmission maps and ambient light. A multi-frequency perception–physics joint loss function is designed, integrating multi-scale structural similarity (MS-SSIM) with physics-consistency regularization. Our contributions include: (i) achieving hyperparameter-free, end-to-end, semantics-preserving enhancement; and (ii) strictly adhering to underwater optical physics constraints, significantly improving detail fidelity. Quantitatively, the method achieves an average PSNR gain of 0.5 dB across six benchmark datasets, with only one-third the computational cost of state-of-the-art methods; under equal FLOPs, it yields over 1.0 dB PSNR improvement.
📝 Abstract
We present a novel dual-stream architecture that achieves state-of-the-art underwater image enhancement by explicitly integrating the Jaffe-McGlamery physical model with capsule clustering-based feature representation learning. Our method simultaneously estimates transmission maps and spatially-varying background light through a dedicated physics estimator while extracting entity-level features via capsule clustering in a parallel stream. This physics-guided approach enables parameter-free enhancement that respects underwater formation constraints while preserving semantic structures and fine-grained details. Our approach also features a novel optimization objective ensuring both physical adherence and perceptual quality across multiple spatial frequencies. To validate our approach, we conducted extensive experiments across six challenging benchmarks. Results demonstrate consistent improvements of $+0.5$dB PSNR over the best existing methods while requiring only one-third of their computational complexity (FLOPs), or alternatively, more than $+1$dB PSNR improvement when compared to methods with similar computational budgets. Code and data extit{will} be available at https://github.com/iN1k1/.