Evaluating and Predicting Distorted Human Body Parts for Generated Images

📅 2025-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
AI-generated portraits frequently exhibit anatomical distortions—including limb duplication, missing fingers, joint deformities, and fused body parts—yet no standardized evaluation framework exists for such fine-grained structural flaws. Method: We introduce Distortion-5K, the first large-scale, multi-style benchmark with pixel-level anatomical distortion annotations, and propose ViT-HD—a vision transformer enhanced with human pose priors and structural constraints for precise distortion localization. We also formally define and standardize anatomical distortion recognition criteria. Results: ViT-HD achieves F1 = 0.899 and IoU = 0.831 on distortion localization. Evaluation on the Human Distortion Benchmark reveals that nearly 50% of outputs from leading text-to-image models contain significant anatomical distortions. This work establishes foundational tools and quantitative metrics for assessing generative image fidelity and enabling controllable, anatomically plausible portrait synthesis.

Technology Category

Application Category

📝 Abstract
Recent advancements in text-to-image (T2I) models enable high-quality image synthesis, yet generating anatomically accurate human figures remains challenging. AI-generated images frequently exhibit distortions such as proliferated limbs, missing fingers, deformed extremities, or fused body parts. Existing evaluation metrics like Inception Score (IS) and Fr'echet Inception Distance (FID) lack the granularity to detect these distortions, while human preference-based metrics focus on abstract quality assessments rather than anatomical fidelity. To address this gap, we establish the first standards for identifying human body distortions in AI-generated images and introduce Distortion-5K, a comprehensive dataset comprising 4,700 annotated images of normal and malformed human figures across diverse styles and distortion types. Based on this dataset, we propose ViT-HD, a Vision Transformer-based model tailored for detecting human body distortions in AI-generated images, which outperforms state-of-the-art segmentation models and visual language models, achieving an F1 score of 0.899 and IoU of 0.831 on distortion localization. Additionally, we construct the Human Distortion Benchmark with 500 human-centric prompts to evaluate four popular T2I models using trained ViT-HD, revealing that nearly 50% of generated images contain distortions. This work pioneers a systematic approach to evaluating anatomical accuracy in AI-generated humans, offering tools to advance the fidelity of T2I models and their real-world applicability. The Distortion-5K dataset, trained ViT-HD will soon be released in our GitHub repository: href{https://github.com/TheRoadQaQ/Predicting-Distortion}{https://github.com/TheRoadQaQ/Predicting-Distortion}.
Problem

Research questions and friction points this paper is trying to address.

Detecting anatomical distortions in AI-generated human images.
Developing a dataset and model for evaluating image fidelity.
Assessing text-to-image models for human figure accuracy.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed Distortion-5K dataset for human body distortions.
Introduced ViT-HD model for distortion detection in images.
Created Human Distortion Benchmark to evaluate T2I models.
🔎 Similar Papers
No similar papers found.
L
Lu Ma
Peking University, Tencent Inc.
K
Kaibo Cao
Tencent Inc.
H
Hao Liang
Peking University
Jiaxin Lin
Jiaxin Lin
The University of Texas at Austin
Computer Science
Z
Zhuang Li
Tencent Inc.
Yuhong Liu
Yuhong Liu
Santa Clara University
Trustworthy AISecurity and PrivacyIoTBlockchainSocial network
J
Jihong Zhang
Tencent Inc.
Wentao Zhang
Wentao Zhang
Institute of Physics, Chinese Academy of Sciences
photoemissionsuperconductivitycupratehtsctime-resolved
B
Bin Cui
Peking University