TReFT: Taming Rectified Flow Models For One-Step Image Translation

πŸ“… 2025-11-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Rectified Flow (RF) models suffer from low inference efficiency in image-to-image translation due to their reliance on multi-step denoising, hindering real-time deployment. Existing adversarial training approaches (e.g., CycleGAN-Turbo) cannot be directly adapted to the RF framework without causing training instability or divergence. To address this, we propose TReFTβ€”the first method to leverage the geometric stability of pre-trained RF terminal velocity fields to guide adversarial learning. TReFT introduces a lightweight velocity prediction architecture and jointly optimizes with latent-space cycle-consistency and identity losses. Our approach enables high-fidelity, single-step image translation, achieving state-of-the-art performance across multiple benchmark datasets. Notably, TReFT accelerates inference by two orders of magnitude over conventional multi-step RFs, enabling practical real-time applications.

Technology Category

Application Category

πŸ“ Abstract
Rectified Flow (RF) models have advanced high-quality image and video synthesis via optimal transport theory. However, when applied to image-to-image translation, they still depend on costly multi-step denoising, hindering real-time applications. Although the recent adversarial training paradigm, CycleGAN-Turbo, works in pretrained diffusion models for one-step image translation, we find that directly applying it to RF models leads to severe convergence issues. In this paper, we analyze these challenges and propose TReFT, a novel method to Tame Rectified Flow models for one-step image Translation. Unlike previous works, TReFT directly uses the velocity predicted by pretrained DiT or UNet as output-a simple yet effective design that tackles the convergence issues under adversarial training with one-step inference. This design is mainly motivated by a novel observation that, near the end of the denoising process, the velocity predicted by pretrained RF models converges to the vector from origin to the final clean image, a property we further justify through theoretical analysis. When applying TReFT to large pretrained RF models such as SD3.5 and FLUX, we introduce memory-efficient latent cycle-consistency and identity losses during training, as well as lightweight architectural simplifications for faster inference. Pretrained RF models finetuned with TReFT achieve performance comparable to sota methods across multiple image translation datasets while enabling real-time inference.
Problem

Research questions and friction points this paper is trying to address.

Rectified Flow models require costly multi-step denoising for image translation
Direct adversarial training causes severe convergence issues in RF models
Existing methods hinder real-time image translation applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

TReFT enables one-step image translation
Uses pretrained velocity output directly
Memory-efficient training with lightweight architecture
πŸ”Ž Similar Papers
No similar papers found.
S
Shengqian Li
University of Chinese Academy of Sciences, Beijing, China
M
Ming Gao
CreateAI, Beijing, China
Y
Yi Liu
Beihang University, Beijing, China
Z
Zuzeng Lin
Tianjin University, Tianjin, China
F
Feng Wang
CreateAI, Beijing, China
Feng Dai
Feng Dai
Institute of Computing Technology, Chinese Academy of Sciences
video coding and processingcomputational imaging