PICABench: How Far Are We from Physically Realistic Image Editing?

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current image editing methods prioritize semantic instruction execution while neglecting the concurrent removal of physically grounded artifacts—such as shadows, reflections, and mechanical interactions—that arise from object deletion, leading to physically implausible outputs. This work presents the first systematic evaluation of physical plausibility in image editing. We introduce PICABench, a comprehensive benchmark spanning eight dimensions across optics, mechanics, and state changes. To address the challenge, we propose PICA-100K, a video-based physics-aware editing model trained on 100K video clips to learn spatiotemporal physical priors. For rigorous assessment, we design PICAEval—a hybrid evaluation protocol integrating vision-language models (VLMs) with region-level human annotations to enable fine-grained physical consistency scoring. Experiments reveal severe physical inconsistencies in state-of-the-art models; PICA-100K significantly improves physical plausibility, establishing a new paradigm, benchmark, and data foundation for advancing image editing from “content accuracy” to “physical reasonableness.”

Technology Category

Application Category

📝 Abstract
Image editing has achieved remarkable progress recently. Modern editing models could already follow complex instructions to manipulate the original content. However, beyond completing the editing instructions, the accompanying physical effects are the key to the generation realism. For example, removing an object should also remove its shadow, reflections, and interactions with nearby objects. Unfortunately, existing models and benchmarks mainly focus on instruction completion but overlook these physical effects. So, at this moment, how far are we from physically realistic image editing? To answer this, we introduce PICABench, which systematically evaluates physical realism across eight sub-dimension (spanning optics, mechanics, and state transitions) for most of the common editing operations (add, remove, attribute change, etc). We further propose the PICAEval, a reliable evaluation protocol that uses VLM-as-a-judge with per-case, region-level human annotations and questions. Beyond benchmarking, we also explore effective solutions by learning physics from videos and construct a training dataset PICA-100K. After evaluating most of the mainstream models, we observe that physical realism remains a challenging problem with large rooms to explore. We hope that our benchmark and proposed solutions can serve as a foundation for future work moving from naive content editing toward physically consistent realism.
Problem

Research questions and friction points this paper is trying to address.

Evaluating physical realism in image editing across multiple dimensions
Assessing shadow, reflection and interaction effects during object manipulation
Benchmarking models' ability to maintain physical consistency during edits
Innovation

Methods, ideas, or system contributions that make the work stand out.

PICABench evaluates physical realism in image editing
PICA-100K dataset learns physics from video training
PICAEval uses VLM-as-a-judge for reliable evaluation
🔎 Similar Papers
No similar papers found.
Yuandong Pu
Yuandong Pu
SJTU,Shanghai AI Laboratory
Computer Vision
Le Zhuo
Le Zhuo
Krea AI
generative modelsmulti-modal learning
S
Songhao Han
Beihang University
Jinbo Xing
Jinbo Xing
The Chinese University of Hong Kong
Computer Graphics and Vision
Kaiwen Zhu
Kaiwen Zhu
Shanghai Jiao Tong University
Multi-Modal GenerationComputer Vision
S
Shuo Cao
USTC
B
Bin Fu
Shanghai AI Laboratory
Si Liu
Si Liu
Fred Hutchinson Cancer Center
GenomicsBiostatisticsAnomaly DetectionOpen Category Detection
H
Hongsheng Li
CUHK MMLab
Y
Yu Qiao
Shanghai AI Laboratory
W
Wenlong Zhang
Shanghai AI Laboratory
X
Xi Chen
The University of Hong Kong
Y
Yihao Liu
Shanghai AI Laboratory