🤖 AI Summary
To address multi-style image translation on resource-constrained edge devices, this paper proposes a lightweight framework that decouples style representation learning from translation. First, we design a compact, plug-and-play style encoder enabling parameter-free, infinite style expansion. Second, we introduce the Style-Aware Multi-Style Translation (SaMST) network, which achieves multi-style synthesis at inference time with zero additional computational overhead via feature disentanglement and lightweight adaptation. The method preserves real-time performance on edge hardware while significantly improving style encoding accuracy and translation quality, achieving state-of-the-art results across multiple quantitative metrics. Key contributions include: (i) the first plug-and-play style representation mechanism, enabling on-the-fly integration of novel styles without retraining; and (ii) the first inference-time zero-overhead multi-style translation approach. This work establishes a new paradigm for efficient, scalable style transfer on edge devices.
📝 Abstract
Due to the high diversity of image styles, the scalability to various styles plays a critical role in real-world applications. To accommodate a large amount of styles, previous multi-style transfer approaches rely on enlarging the model size while arbitrary-style transfer methods utilize heavy backbones. However, the additional computational cost introduced by more model parameters hinders these methods to be deployed on resource-limited devices. To address this challenge, in this paper, we develop a style transfer framework by decoupling the style modeling and transferring. Specifically, for style modeling, we propose a style representation learning scheme to encode the style information into a compact representation. Then, for style transferring, we develop a style-aware multi-style transfer network (SaMST) to adapt to diverse styles using pluggable style representations. In this way, our framework is able to accommodate diverse image styles in the learned style representations without introducing additional overhead during inference, thereby maintaining efficiency. Experiments show that our style representation can extract accurate style information. Moreover, qualitative and quantitative results demonstrate that our method achieves state-of-the-art performance in terms of both accuracy and efficiency. The codes are available in https://github.com/The-Learning-And-Vision-Atelier-LAVA/SaMST.