Enhancing Image Aesthetics with Dual-Conditioned Diffusion Models Guided by Multimodal Perception

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses two key challenges in image aesthetic enhancement: the difficulty of accurately interpreting and executing ambiguous aesthetic editing instructions, and the scarcity of training data comprising content-consistent image pairs with varying aesthetic quality. To tackle these issues, the authors propose DIAE, a diffusion-based model equipped with a multimodal aesthetic perception mechanism that translates textual instructions into structured aesthetic guidance. Additionally, they introduce IIAEData, a weakly paired dataset, and devise a dual-branch weakly supervised training strategy. Without relying on high-quality, perfectly aligned image pairs, the proposed approach significantly improves the model’s ability to understand and implement complex aesthetic directives, outperforming existing methods in both aesthetic quality and content consistency.

Technology Category

Application Category

📝 Abstract

Image aesthetic enhancement aims to perceive aesthetic deficiencies in images and perform corresponding editing operations, which is highly challenging and requires the model to possess creativity and aesthetic perception capabilities. Although recent advancements in image editing models have significantly enhanced their controllability and flexibility, they struggle with enhancing image aesthetic. The primary challenges are twofold: first, following editing instructions with aesthetic perception is difficult, and second, there is a scarcity of "perfectly-paired" images that have consistent content but distinct aesthetic qualities. In this paper, we propose Dual-supervised Image Aesthetic Enhancement (DIAE), a diffusion-based generative model with multimodal aesthetic perception. First, DIAE incorporates Multimodal Aesthetic Perception (MAP) to convert the ambiguous aesthetic instruction into explicit guidance by (i) employing detailed, standardized aesthetic instructions across multiple aesthetic attributes, and (ii) utilizing multimodal control signals derived from text-image pairs that maintain consistency within the same aesthetic attribute. Second, to mitigate the lack of "perfectly-paired" images, we collect "imperfectly-paired" dataset called IIAEData, consisting of images with varying aesthetic qualities while sharing identical semantics. To better leverage the weak matching characteristics of IIAEData during training, a dual-branch supervision framework is also introduced for weakly supervised image aesthetic enhancement. Experimental results demonstrate that DIAE outperforms the baselines and obtains superior image aesthetic scores and image content consistency scores.

Problem

Research questions and friction points this paper is trying to address.

image aesthetic enhancement

aesthetic perception

imperfectly-paired data

multimodal guidance

diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-supervised learning

Multimodal aesthetic perception

Diffusion models