🤖 AI Summary
Lightweight generative models’ zero-shot generalization capability across low-level vision tasks remains underexplored, particularly regarding the discrepancy between human perception and pixel-wise metrics. Method: We systematically evaluate Nano Banana Pro—a lightweight generative model—on 14 low-level vision tasks (e.g., denoising, super-resolution, deblurring) across 40 benchmark datasets, using only unified text prompts without fine-tuning. We introduce a zero-shot prompt engineering paradigm and a multidimensional evaluation framework incorporating LPIPS, NIQE, and user studies. Contribution/Results: We reveal, for the first time, a significant perceptual–objective divergence: while Nano Banana Pro produces outputs subjectively superior in naturalness and high-frequency detail compared to task-specific models, it consistently underperforms in PSNR/SSIM due to an inherent trade-off between generative stochasticity and pixel-level consistency. Our analysis establishes Nano Banana Pro as a “vision generalist,” delineating its capabilities and limitations, and provides a new benchmark and conceptual foundation for generative modeling in low-level vision.
📝 Abstract
The rapid evolution of text-to-image generation models has revolutionized visual content creation. While commercial products like Nano Banana Pro have garnered significant attention, their potential as generalist solvers for traditional low-level vision challenges remains largely underexplored. In this study, we investigate the critical question: Is Nano Banana Pro a Low-Level Vision All-Rounder? We conducted a comprehensive zero-shot evaluation across 14 distinct low-level tasks spanning 40 diverse datasets. By utilizing simple textual prompts without fine-tuning, we benchmarked Nano Banana Pro against state-of-the-art specialist models. Our extensive analysis reveals a distinct performance dichotomy: while extbf{Nano Banana Pro demonstrates superior subjective visual quality}, often hallucinating plausible high-frequency details that surpass specialist models, it lags behind in traditional reference-based quantitative metrics. We attribute this discrepancy to the inherent stochasticity of generative models, which struggle to maintain the strict pixel-level consistency required by conventional metrics. This report identifies Nano Banana Pro as a capable zero-shot contender for low-level vision tasks, while highlighting that achieving the high fidelity of domain specialists remains a significant hurdle.