Fine-Grained Human Pose Editing Assessment via Layer-Selective MLLMs

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses structural anomalies and generation artifacts prevalent in existing text-guided human pose editing methods, as well as the absence of fine-grained pose consistency evaluation metrics. To bridge this gap, we propose the first unified evaluation framework based on a layer-selective multimodal large language model (MLLM), integrating contrastive LoRA fine-tuning with Layer Sensitivity Analysis (LSA) to precisely identify optimal feature layers. This enables simultaneous authenticity detection and multidimensional quality regression. Furthermore, we introduce HPE-Bench, a new benchmark dataset designed to support systematic evaluation of pose editing outputs. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on both tasks, effectively bridging the divide between forensic analysis of generated content and comprehensive quality assessment.

Technology Category

Application Category

📝 Abstract
Text-guided human pose editing has gained significant traction in AIGC applications. However,it remains plagued by structural anomalies and generative artifacts. Existing evaluation metrics often isolate authenticity detection from quality assessment, failing to provide fine-grained insights into pose-specific inconsistencies. To address these limitations, we introduce HPE-Bench, a specialized benchmark comprising 1,700 standardized samples from 17 state-of-the-art editing models, offering both authenticity labels and multi-dimensional quality scores. Furthermore, we propose a unified framework based on layer-selective multimodal large language models (MLLMs). By employing contrastive LoRA tuning and a novel layer sensitivity analysis (LSA) mechanism, we identify the optimal feature layer for pose evaluation. Our framework achieves superior performance in both authenticity detection and multi-dimensional quality regression, effectively bridging the gap between forensic detection and quality assessment.
Problem

Research questions and friction points this paper is trying to address.

human pose editing
structural anomalies
generative artifacts
evaluation metrics
pose-specific inconsistencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

layer-selective MLLMs
HPE-Bench
layer sensitivity analysis
contrastive LoRA tuning
fine-grained pose evaluation
🔎 Similar Papers
No similar papers found.
N
Ningyu Sun
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University
Z
Zhaolin Cai
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University
Zitong Xu
Zitong Xu
Shanghai Jiao Tong University
Image Quality AssessmentImage Editing
P
Peihang Chen
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University
Huiyu Duan
Huiyu Duan
Shanghai Jiao Tong University
Multimedia Signal Processing
Y
Yichao Yan
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University
X
Xiongkuo Min
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University
X
Xiaokang Yang
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University