InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper introduces zero-shot human-object interaction (HOI) image editing: given an input image, the task is to replace the original HOI relation with a specified new one while strictly preserving the identities of both the human and the object. To address challenges in modeling spatial layout, contextual coherence, and relational dependencies, we propose the first HOI-decoupled editing paradigm—decomposing the scene into human, object, and background components and incorporating pretrained interaction priors. We further design a LoRA-driven selective fine-tuning strategy to jointly optimize interaction reconstruction fidelity and identity consistency. We introduce IEBench, the first comprehensive benchmark for HOI image editing. Extensive experiments demonstrate that our method significantly outperforms existing approaches in both interaction accuracy and identity preservation, establishing a new state-of-the-art baseline for HOI image editing.

Technology Category

Application Category

📝 Abstract
This paper presents InteractEdit, a novel framework for zero-shot Human-Object Interaction (HOI) editing, addressing the challenging task of transforming an existing interaction in an image into a new, desired interaction while preserving the identities of the subject and object. Unlike simpler image editing scenarios such as attribute manipulation, object replacement or style transfer, HOI editing involves complex spatial, contextual, and relational dependencies inherent in humans-objects interactions. Existing methods often overfit to the source image structure, limiting their ability to adapt to the substantial structural modifications demanded by new interactions. To address this, InteractEdit decomposes each scene into subject, object, and background components, then employs Low-Rank Adaptation (LoRA) and selective fine-tuning to preserve pretrained interaction priors while learning the visual identity of the source image. This regularization strategy effectively balances interaction edits with identity consistency. We further introduce IEBench, the most comprehensive benchmark for HOI editing, which evaluates both interaction editing and identity preservation. Our extensive experiments show that InteractEdit significantly outperforms existing methods, establishing a strong baseline for future HOI editing research and unlocking new possibilities for creative and practical applications. Code will be released upon publication.
Problem

Research questions and friction points this paper is trying to address.

Editing human-object interactions in images without prior examples.
Preserving identities while transforming interactions in images.
Overcoming structural limitations in existing image editing methods.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes scenes into subject, object, background.
Uses LoRA and selective fine-tuning for identity preservation.
Introduces IEBench for comprehensive HOI editing evaluation.
🔎 Similar Papers
2024-08-202024 2nd International Conference on Computer, Vision and Intelligent Technology (ICCVIT)Citations: 2
Jiun Tian Hoe
Jiun Tian Hoe
Nanyang Technological University
Computer VisionDeep LearningImage GenerationImage Retrieval
W
Weipeng Hu
Nanyang Technological University
W
Wei Zhou
Sun Yat-sen University
C
Chao Xie
Nanjing Forestry University
Z
Ziwei Wang
Nanyang Technological University
Chee Seng Chan
Chee Seng Chan
Universiti Malaya, Malaysia
Computer VisionMachine LearningImage Processing
X
Xudong Jiang
Nanyang Technological University
Y
Yap-Peng Tan
Nanyang Technological University