MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing generative models lack standardized benchmarks for evaluating identity preservation in complex multi-person action image generation. Method: We propose MultiHumanBench—the first comprehensive evaluation benchmark for multi-person image generation—comprising 1,800 high-quality samples spanning diverse poses, interpersonal interactions, and textual descriptions. Our methodology introduces multi-dimensional quantitative metrics: face-recognition-based identity similarity, pose-estimation-based action accuracy, and instance-level region alignment via human segmentation and Hungarian matching. Additionally, we design a region-aware prior modeling strategy and an identity isolation mechanism to enhance ID consistency and action fidelity under joint text-pose guidance. Contribution/Results: Extensive experiments demonstrate the effectiveness of our techniques across mainstream generative models, establishing the first standardized evaluation framework for multi-person image generation and enabling systematic assessment of identity and motion fidelity.

Technology Category

Application Category

📝 Abstract

Generation of images containing multiple humans, performing complex actions, while preserving their facial identities, is a significant challenge. A major factor contributing to this is the lack of a a dedicated benchmark. To address this, we introduce MultiHuman-Testbench, a novel benchmark for rigorously evaluating generative models for multi-human generation. The benchmark comprises 1800 samples, including carefully curated text prompts, describing a range of simple to complex human actions. These prompts are matched with a total of 5,550 unique human face images, sampled uniformly to ensure diversity across age, ethnic background, and gender. Alongside captions, we provide human-selected pose conditioning images which accurately match the prompt. We propose a multi-faceted evaluation suite employing four key metrics to quantify face count, ID similarity, prompt alignment, and action detection. We conduct a thorough evaluation of a diverse set of models, including zero-shot approaches and training-based methods, with and without regional priors. We also propose novel techniques to incorporate image and region isolation using human segmentation and Hungarian matching, significantly improving ID similarity. Our proposed benchmark and key findings provide valuable insights and a standardized tool for advancing research in multi-human image generation.

Problem

Research questions and friction points this paper is trying to address.

Lack of benchmark for multi-human image generation

Challenges in preserving facial identities in complex actions

Need for standardized evaluation metrics for generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces MultiHuman-Testbench benchmark for multi-human generation

Uses human segmentation and Hungarian matching for ID similarity

Proposes multi-faceted evaluation suite with four key metrics

🔎 Similar Papers

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

2024-06-24arXiv.orgCitations: 17

TikTok

San Jose, California

Research Engineer/Scientist (all levels), World Models

TikTok

San Jose, California

Research Scientist Intern, Multimodal AI (PhD)