FED-Bench: A Cross-Granular Benchmark for Disentangled Evaluation of Facial Expression Editing

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing facial expression editing benchmarks are hindered by low-quality images, ambiguous instructions, and biased evaluation metrics, making it difficult to simultaneously preserve identity and achieve precise expression control. To address these limitations, this work introduces a high-quality benchmark dataset comprising 747 triplets and proposes FED-Score, a cross-granularity evaluation protocol that decouples performance into three dimensions: instruction alignment, fidelity, and expression gain. Leveraging a cascaded, scalable data construction pipeline and a large-scale in-the-wild training set, we conduct a systematic evaluation of 18 models, uncovering their bottlenecks in fine-grained instruction following. Further fine-tuning of baseline models substantially improves editing performance, demonstrating the effectiveness and extensibility of the proposed benchmark.
📝 Abstract
Facial expression image editing requires fine-grained control to strictly preserve human identity and background while precisely manipulating expression. However, existing editing benchmarks primarily focus on general scenarios, lacking high-quality facial images and corresponding editing instructions. Furthermore, current evaluation metrics exhibit systemic biases in this task, often favoring lazy editing or overfit editing. To bridge these gaps, we propose FED-Bench, a comprehensive benchmark featuring rigorous testing and an accurate evaluation suite. First, we carefully construct a benchmark of 747 triplets through a cascaded and scalable pipeline, each comprising an original image, an editing instruction, and a ground-truth image for precise evaluation. Second, we introduce FED-Score, a cross-granularity evaluation protocol that disentangles assessment into three dimensions: Alignment for verifying instruction following, Fidelity for testing image quality and identity preservation, and Relative Expression Gain for quantifying the magnitude of expression changes, effectively mitigating the aforementioned evaluation biases. Third, we benchmark 18 image editing models, revealing that current approaches struggle to simultaneously achieve high fidelity and accurate expression manipulation, with fine-grained instruction following identified as the primary bottleneck. Finally, leveraging the scalable characteristic of introduced benchmark engine, we provide a 20k+ in-the-wild facial training set and demonstrate its effectiveness by fine-tuning a baseline model that achieves significant performance gains. Our benchmark and related code will be made publicly open soon.
Problem

Research questions and friction points this paper is trying to address.

Facial Expression Editing
Benchmark
Evaluation Bias
Disentangled Evaluation
Fine-grained Control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Facial Expression Editing
Disentangled Evaluation
Cross-Granular Benchmark
FED-Score
Instruction-Following
🔎 Similar Papers
No similar papers found.
F
Fengjian Xue
Xi’an Jiaotong University
X
Xuecheng Wu
Xi’an Jiaotong University
H
Heli Sun
Xi’an Jiaotong University
Y
Yunyun Shi
Xi’an Jiaotong University
S
Shi Chen
Xi’an Jiaotong University
L
Liangyu Fu
Northwestern Polytechnical University
Jinheng Xie
Jinheng Xie
National University of Singapore
Deep LearningComputer VisionGenerative AI
Dingkang Yang
Dingkang Yang
ByteDance
Multimodal LearningGenerative AIEmbodied AI
H
Hao Wang
Xi’an Jiaotong University
Junxiao Xue
Junxiao Xue
Zhejiang Lab
Computer GraphicsCrowd simulationMulti-agents ModelingMulti-modal Learning
L
Liang He
Xi’an Jiaotong University