🤖 AI Summary
Metalens imaging suffers from severe spatially non-uniform optical degradation, heavy reliance on precise calibration or large-scale paired training data, and susceptibility to hallucination artifacts. To address these challenges, we propose an unpaired, tunable multi-path diffusion framework. Our method introduces a tripartite prompting mechanism—comprising positive, neutral, and negative prompts—to jointly suppress degradation and guide fine-grained detail synthesis. We further design a Spatially Variant Distortion-Aware (SVDA) attention module to adaptively model millimeter-scale, non-uniform aberrations inherent in MetaCamera systems. A tunable decoder is incorporated to explicitly balance reconstruction fidelity and perceptual quality. By synergistically integrating physics-based priors with natural image priors from pre-trained diffusion models—and augmented with pseudo-data generation—the framework achieves state-of-the-art performance on real metalens hardware: significantly enhancing image sharpness and fidelity while effectively suppressing hallucinations, outperforming both existing supervised and unsupervised approaches.
📝 Abstract
Metalenses offer significant potential for ultra-compact computational imaging but face challenges from complex optical degradation and computational restoration difficulties. Existing methods typically rely on precise optical calibration or massive paired datasets, which are non-trivial for real-world imaging systems. Furthermore, a lack of control over the inference process often results in undesirable hallucinated artifacts. We introduce Degradation-Modeled Multipath Diffusion for tunable metalens photography, leveraging powerful natural image priors from pretrained models instead of large datasets. Our framework uses positive, neutral, and negative-prompt paths to balance high-frequency detail generation, structural fidelity, and suppression of metalens-specific degradation, alongside extit{pseudo} data augmentation. A tunable decoder enables controlled trade-offs between fidelity and perceptual quality. Additionally, a spatially varying degradation-aware attention (SVDA) module adaptively models complex optical and sensor-induced degradation. Finally, we design and build a millimeter-scale MetaCamera for real-world validation. Extensive results show that our approach outperforms state-of-the-art methods, achieving high-fidelity and sharp image reconstruction. More materials: https://dmdiff.github.io/.