TripleFDS: Triple Feature Disentanglement and Synthesis for Scene Text Editing

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scene Text Editing (STE) suffers from unnatural edits and poor controllability due to entanglement among text style, content, and background features. To address this, we propose TripleFDS, the first framework enabling complete disentanglement and controllable synthesis of these three feature dimensions. TripleFDS employs an SCB Group architecture to explicitly model style, content, and background subspaces; it enforces inter-group contrastive regularization and intra-group orthogonality constraints to enhance semantic accuracy and feature independence. Additionally, a feature remapping strategy is introduced to improve synthesis fidelity. The framework is trained end-to-end on the SCB Synthesis dataset and achieves state-of-the-art performance on major STE benchmarks: SSIM of 44.54 and text recognition accuracy of 93.58%. It supports fine-grained editing operations—including style replacement and background transfer—demonstrating superior controllability and versatility.

Technology Category

Application Category

📝 Abstract
Scene Text Editing (STE) aims to naturally modify text in images while preserving visual consistency, the decisive factors of which can be divided into three parts, i.e., text style, text content, and background. Previous methods have struggled with incomplete disentanglement of editable attributes, typically addressing only one aspect - such as editing text content - thus limiting controllability and visual consistency. To overcome these limitations, we propose TripleFDS, a novel framework for STE with disentangled modular attributes, and an accompanying dataset called SCB Synthesis. SCB Synthesis provides robust training data for triple feature disentanglement by utilizing the "SCB Group", a novel construct that combines three attributes per image to generate diverse, disentangled training groups. Leveraging this construct as a basic training unit, TripleFDS first disentangles triple features, ensuring semantic accuracy through inter-group contrastive regularization and reducing redundancy through intra-sample multi-feature orthogonality. In the synthesis phase, TripleFDS performs feature remapping to prevent "shortcut" phenomena during reconstruction and mitigate potential feature leakage. Trained on 125,000 SCB Groups, TripleFDS achieves state-of-the-art image fidelity (SSIM of 44.54) and text accuracy (ACC of 93.58%) on the mainstream STE benchmarks. Besides superior performance, the more flexible editing of TripleFDS supports new operations such as style replacement and background transfer. Code: https://github.com/yusenbao01/TripleFDS
Problem

Research questions and friction points this paper is trying to address.

Disentangling text style, content, and background features for scene text editing
Preventing incomplete attribute separation to enhance visual consistency
Enabling flexible text operations like style replacement and background transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangles text style, content, and background features
Uses SCB Groups for triple feature disentanglement training
Performs feature remapping to prevent shortcut reconstruction
Y
Yuchen Bao
Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
Yiting Wang
Yiting Wang
Graduate Student, University of Maryland
AI for EDAHardware Security
Wenjian Huang
Wenjian Huang
Peking University
BioMedical Image&Signal ProcessingMachine LearningArtificial IntelligenceStatistical LearningComputer Vision
H
Haowei Wang
Tencent Youtu Lab
S
Shen Chen
Tencent Youtu Lab
Taiping Yao
Taiping Yao
Tencent
face anti-spoofing;deepfake;adversial attack
S
Shouhong Ding
Tencent Youtu Lab
J
Jianguo Zhang
Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China