HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment

๐Ÿ“… 2025-03-31
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the longstanding lack of systematic research and dedicated resources for Human Image Aesthetic Assessment (HIAA). To this end, it introduces the first end-to-end HIAA framework. Methodologically, it (1) constructs HumanBeautyโ€”the first large-scale, fine-grained HIAA dataset (108K images) annotated across 12 expert-defined aesthetic dimensions; (2) proposes HumanAesExpert, a multi-head vision-language model integrating an Expert Head (incorporating domain knowledge), an LLM Head, and a Regression Head, jointly optimized via a novel MetaVoter mechanism for dynamic output fusion; and (3) employs a hybrid data curation paradigm combining human expert selection with semi-automatic filtering. Experiments demonstrate substantial improvements over existing state-of-the-art methods. All code, models, and the HumanBeauty dataset are publicly released to advance the HIAA research community.

Technology Category

Application Category

๐Ÿ“ Abstract
Image Aesthetic Assessment (IAA) is a long-standing and challenging research task. However, its subset, Human Image Aesthetic Assessment (HIAA), has been scarcely explored, even though HIAA is widely used in social media, AI workflows, and related domains. To bridge this research gap, our work pioneers a holistic implementation framework tailored for HIAA. Specifically, we introduce HumanBeauty, the first dataset purpose-built for HIAA, which comprises 108k high-quality human images with manual annotations. To achieve comprehensive and fine-grained HIAA, 50K human images are manually collected through a rigorous curation process and annotated leveraging our trailblazing 12-dimensional aesthetic standard, while the remaining 58K with overall aesthetic labels are systematically filtered from public datasets. Based on the HumanBeauty database, we propose HumanAesExpert, a powerful Vision Language Model for aesthetic evaluation of human images. We innovatively design an Expert head to incorporate human knowledge of aesthetic sub-dimensions while jointly utilizing the Language Modeling (LM) and Regression head. This approach empowers our model to achieve superior proficiency in both overall and fine-grained HIAA. Furthermore, we introduce a MetaVoter, which aggregates scores from all three heads, to effectively balance the capabilities of each head, thereby realizing improved assessment precision. Extensive experiments demonstrate that our HumanAesExpert models deliver significantly better performance in HIAA than other state-of-the-art models. Our datasets, models, and codes are publicly released to advance the HIAA community. Project webpage: https://humanaesexpert.github.io/HumanAesExpert/
Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of research in Human Image Aesthetic Assessment (HIAA)
Creating a specialized dataset (HumanBeauty) for HIAA with manual annotations)
Developing a Vision Language Model (HumanAesExpert) for superior HIAA performance)
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces HumanBeauty dataset for HIAA
Develops HumanAesExpert Vision Language Model
Proposes MetaVoter for balanced score aggregation
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zhichao Liao
Tsinghua University
X
Xiaokun Liu
Kuaishou Technology
Wenyu Qin
Wenyu Qin
Harbin Institute of Technology
Control
Qingyu Li
Qingyu Li
Chinese University of Hong Kong, Shenzhen
artificial intelligenceremote sensing
Q
Qiulin Wang
Kuaishou Technology
Pengfei Wan
Pengfei Wan
Head of Kling Video Generation Models, Kuaishou Technology
Generative ModelsComputer VisionMultimodal AIComputer Graphics
D
Di Zhang
Kuaishou Technology
L
Long Zeng
Tsinghua University
P
Pingfa Feng
Tsinghua University