FlowFixer: Towards Detail-Preserving Subject-Driven Generation

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the loss of fine details in subject-driven generation caused by scale and viewpoint variations. To this end, the authors propose FlowFixer, a framework that recovers high-fidelity details through direct image-to-image translation from visual references, thereby circumventing the semantic ambiguity inherent in textual prompts. The method employs a one-step denoising strategy to self-supervise the generation of training data that realistically simulates common generative artifacts. Furthermore, a novel detail fidelity metric based on keypoint matching is introduced, overcoming the limitations of conventional evaluation approaches that rely solely on semantic similarity. Experimental results demonstrate that FlowFixer consistently outperforms existing methods in both qualitative and quantitative assessments, establishing a new benchmark for high-fidelity detail generation in subject-driven synthesis.

Technology Category

Application Category

📝 Abstract
We present FlowFixer, a refinement framework for subject-driven generation (SDG) that restores fine details lost during generation caused by changes in scale and perspective of a subject. FlowFixer proposes direct image-to-image translation from visual references, avoiding ambiguities in language prompts. To enable image-to-image training, we introduce a one-step denoising scheme to generate self-supervised training data, which automatically removes high-frequency details while preserving global structure, effectively simulating real-world SDG errors. We further propose a keypoint matching-based metric to properly assess fidelity in details beyond semantic similarities usually measured by CLIP or DINO. Experimental results demonstrate that FlowFixer outperforms state-of-the-art SDG methods in both qualitative and quantitative evaluations, setting a new benchmark for high-fidelity subject-driven generation.
Problem

Research questions and friction points this paper is trying to address.

subject-driven generation
detail preservation
image-to-image translation
fidelity
high-frequency details
Innovation

Methods, ideas, or system contributions that make the work stand out.

subject-driven generation
image-to-image translation
self-supervised denoising
detail preservation
keypoint matching
J
Jinyoung Jun
Amazon
W
Won-Dong Jang
Amazon
W
Wenbin Ouyang
Amazon
R
Raghudeep Gadde
Amazon
Jungbeom Lee
Jungbeom Lee
Amazon
Deep LearningComputer VisionMulti-Modal Learning