NumeriKontrol: Adding Numeric Control to Diffusion Transformers for Instruction-based Image Editing

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing instruction-based image editing methods lack fine-grained, numerically precise control over attribute modification strength. To address this, we propose Numeric Adapter—a plug-and-play numerical encoding module that embeds continuous scalar strength parameters into diffusion Transformer architectures, enabling joint text-instruction and numeric-strength conditioning. Leveraging a task-decoupled design and the high-fidelity, real-scale supervised synthetic dataset CAT, our method achieves, for the first time, multi-attribute disentanglement, zero-shot generalization, arbitrary-order input handling, and stable, continuous strength modulation. Training data integrate photorealistic renderings with DSLR-captured images. Experiments demonstrate accurate, consistent, and controllable scaling across diverse attributes—including color, shape, and size—significantly improving both precision in strength specification and output stability.

Technology Category

Application Category

📝 Abstract

Instruction-based image editing enables intuitive manipulation through natural language commands. However, text instructions alone often lack the precision required for fine-grained control over edit intensity. We introduce NumeriKontrol, a framework that allows users to precisely adjust image attributes using continuous scalar values with common units. NumeriKontrol encodes numeric editing scales via an effective Numeric Adapter and injects them into diffusion models in a plug-and-play manner. Thanks to a task-separated design, our approach supports zero-shot multi-condition editing, allowing users to specify multiple instructions in any order. To provide high-quality supervision, we synthesize precise training data from reliable sources, including high-fidelity rendering engines and DSLR cameras. Our Common Attribute Transform (CAT) dataset covers diverse attribute manipulations with accurate ground-truth scales, enabling NumeriKontrol to function as a simple yet powerful interactive editing studio. Extensive experiments show that NumeriKontrol delivers accurate, continuous, and stable scale control across a wide range of attribute editing scenarios. These contributions advance instruction-based image editing by enabling precise, scalable, and user-controllable image manipulation.

Problem

Research questions and friction points this paper is trying to address.

Enables precise numeric control of image edit intensity

Supports zero-shot multi-condition editing through natural language

Provides continuous scale manipulation across diverse attribute scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Numeric Adapter encodes continuous scalar values

Plug-and-play injection into diffusion models

Task-separated design enables zero-shot multi-condition editing

🔎 Similar Papers

An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control