Bayesian Optimization for Controlled Image Editing via LLMs

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In image generation, achieving high-precision, natural-language-driven controllable editing while preserving semantic consistency—without relying on large-scale pretraining, fine-tuning, or manual annotations—remains a fundamental challenge. This paper introduces BayesGenie, the first framework integrating large language models (LLMs) with Bayesian optimization in a synergistic pipeline: the LLM parses user instructions and generates initial prompts, while Bayesian optimization automatically identifies optimal inference parameters (e.g., classifier-free guidance scale, sampling steps) in a zero-shot, model-agnostic, plug-and-play manner. Crucially, BayesGenie requires no modification to the underlying generative model nor any annotated edit regions. Extensive experiments demonstrate consistent superiority over state-of-the-art methods across diverse editing scenarios, with significant gains in both editing accuracy and semantic fidelity. Robustness and generalizability are further validated on both Claude-3 and GPT-4 platforms.

Technology Category

Application Category

📝 Abstract
In the rapidly evolving field of image generation, achieving precise control over generated content and maintaining semantic consistency remain significant limitations, particularly concerning grounding techniques and the necessity for model fine-tuning. To address these challenges, we propose BayesGenie, an off-the-shelf approach that integrates Large Language Models (LLMs) with Bayesian Optimization to facilitate precise and user-friendly image editing. Our method enables users to modify images through natural language descriptions without manual area marking, while preserving the original image's semantic integrity. Unlike existing techniques that require extensive pre-training or fine-tuning, our approach demonstrates remarkable adaptability across various LLMs through its model-agnostic design. BayesGenie employs an adapted Bayesian optimization strategy to automatically refine the inference process parameters, achieving high-precision image editing with minimal user intervention. Through extensive experiments across diverse scenarios, we demonstrate that our framework significantly outperforms existing methods in both editing accuracy and semantic preservation, as validated using different LLMs including Claude3 and GPT-4.
Problem

Research questions and friction points this paper is trying to address.

Achieving precise image editing control
Maintaining semantic consistency in edits
Eliminating need for model fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs integrate Bayesian Optimization
Model-agnostic design enables adaptability
Automated refinement for high-precision editing
🔎 Similar Papers
No similar papers found.
C
Chengkun Cai
University of Edinburgh
H
Haoliang Liu
University of Manchester
X
Xu Zhao
University of Edinburgh
Zhongyu Jiang
Zhongyu Jiang
Apple Inc.
Human Intelligence
T
Tianfang Zhang
Tsinghua University
Z
Zongkai Wu
FancyTech
J
Jenq-Neng Hwang
University of Washington
Serge Belongie
Serge Belongie
University of Copenhagen
Computer VisionMachine Learning
L
Lei Li
University of Washington, University of Copenhagen