TeleStyle: Content-Preserving Style Transfer in Images and Videos

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenge of content-preserving style transfer in diffusion Transformers, where entangled content and style representations hinder effective disentanglement. To this end, we propose TeleStyle, a lightweight model built upon the Qwen-Image-Edit architecture. By integrating high-quality human-curated and large-scale synthetically generated style triplets, we design a curriculum continual learning framework that jointly trains on both clean and noisy samples, substantially improving generalization to unseen styles. Additionally, we introduce a video-to-video stylization module to enhance temporal consistency. Extensive evaluations demonstrate that our method achieves state-of-the-art performance across three key metrics—style similarity, content fidelity, and aesthetic quality—enabling high-fidelity multi-style transfer for both images and videos.

Technology Category

Application Category

📝 Abstract

Content-preserving style transfer, generating stylized outputs based on content and style references, remains a significant challenge for Diffusion Transformers (DiTs) due to the inherent entanglement of content and style features in their internal representations. In this technical report, we present TeleStyle, a lightweight yet effective model for both image and video stylization. Built upon Qwen-Image-Edit, TeleStyle leverages the base model's robust capabilities in content preservation and style customization. To facilitate effective training, we curated a high-quality dataset of distinct specific styles and further synthesized triplets using thousands of diverse, in-the-wild style categories. We introduce a Curriculum Continual Learning framework to train TeleStyle on this hybrid dataset of clean (curated) and noisy (synthetic) triplets. This approach enables the model to generalize to unseen styles without compromising precise content fidelity. Additionally, we introduce a video-to-video stylization module to enhance temporal consistency and visual quality. TeleStyle achieves state-of-the-art performance across three core evaluation metrics: style similarity, content consistency, and aesthetic quality. Code and pre-trained models are available at https://github.com/Tele-AI/TeleStyle

Problem

Research questions and friction points this paper is trying to address.

content-preserving style transfer

Diffusion Transformers

style transfer

image stylization

video stylization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Content-Preserving Style Transfer

Diffusion Transformers

Curriculum Continual Learning