A Creative Agent is Worth a 64-Token Template

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that current text-to-image models struggle to accurately interpret creative intent from ambiguous prompts and often rely on manual prompt engineering or computationally expensive inference mechanisms. To overcome this, we propose the CAT framework, which uniquely integrates disentangled creative semantics with reusable token templates. Specifically, a lightweight Creative Tokenizer learns latent creative representations and generates a fixed-length 64-token template that is directly concatenated to the original prompt to enhance image generation. This approach eliminates the need for iterative inference or prompt optimization, substantially improving efficiency and scalability. Evaluated on architectural, furniture, and nature-integrated design tasks, CAT achieves a 3.7× speedup and 4.8× reduction in computational cost compared to state-of-the-art methods, while generating images that significantly outperform baselines in human preference and text-image alignment.

Technology Category

Application Category

📝 Abstract
Text-to-image (T2I) models have substantially improved image fidelity and prompt adherence, yet their creativity remains constrained by reliance on discrete natural language prompts. When presented with fuzzy prompts such as ``a creative vinyl record-inspired skyscraper'', these models often fail to infer the underlying creative intent, leaving creative ideation and prompt design largely to human users. Recent reasoning- or agent-driven approaches iteratively augment prompts but incur high computational and monetary costs, as their instance-specific generation makes ``creativity'' costly and non-reusable, requiring repeated queries or reasoning for subsequent generations. To address this, we introduce \textbf{CAT}, a framework for \textbf{C}reative \textbf{A}gent \textbf{T}okenization that encapsulates agents' intrinsic understanding of ``creativity'' through a \textit{Creative Tokenizer}. Given the embeddings of fuzzy prompts, the tokenizer generates a reusable token template that can be directly concatenated with them to inject creative semantics into T2I models without repeated reasoning or prompt augmentation. To enable this, the tokenizer is trained via creative semantic disentanglement, leveraging relations among partially overlapping concept pairs to capture the agent's latent creative representations. Extensive experiments on \textbf{\textit{Architecture Design}}, \textbf{\textit{Furniture Design}}, and \textbf{\textit{Nature Mixture}} tasks demonstrate that CAT provides a scalable and effective paradigm for enhancing creativity in T2I generation, achieving a $3.7\times$ speedup and a $4.8\times$ reduction in computational cost, while producing images with superior human preference and text-image alignment compared to state-of-the-art T2I models and creative generation methods.
Problem

Research questions and friction points this paper is trying to address.

text-to-image generation
creative intent
prompt ambiguity
computational cost
creativity reuse
Innovation

Methods, ideas, or system contributions that make the work stand out.

Creative Tokenization
Text-to-Image Generation
Prompt Engineering
Agent-based Creativity
Semantic Disentanglement
🔎 Similar Papers
No similar papers found.
R
Ruixiao Shi
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
F
Fu Feng
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
Y
Yucheng Xie
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
X
Xu Yang
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
Jing Wang
Jing Wang
Nanjing University
Bandit
Xin Geng
Xin Geng
School of Computer Science and Engineering, Southeast University
Artificial IntelligencePattern RecognitionMachine Learning