🤖 AI Summary
This work addresses the challenge that existing text-to-image generation models struggle to precisely control cognitive attributes—such as valence, arousal, dominance, and memorability—to align with specific psychological intents. To overcome this limitation, the authors propose a cognition-driven generative paradigm that establishes a mapping between a cognitive space and the semantic manifold of image features. By introducing cognitive anchors and designing a multidimensional cognitive scoring–guided velocity field interpolation mechanism within a flow-matching framework, the method enables continuous and fine-grained control over the cognitive properties of generated images. Experiments across four core cognitive dimensions demonstrate the efficacy of the approach, marking the first successful realization of high-precision, interpretable intervention on multidimensional cognitive attributes in text-to-image synthesis.
📝 Abstract
Beyond conveying semantic information, an image can also manifest cognitive attributes that elicit specific cognitive processes from the viewer, such as memory encoding or emotional response. While modern text-to-image models excel at generating semantically coherent content, they remain limited in their ability to control such cognitive properties of images (e.g., valence, memorability), often failing to align with the specific psychological intent. To bridge this gap, we introduce CogBlender, a framework that enables continuous and multi-dimensional intervention of cognitive properties during text-to-image generation. Our approach is built upon a mapping between the Cognitive Space, representing the space of cognitive properties, and the Semantic Manifold, representing the manifold of the visual semantics. We define a set of Cognitive Anchors, serving as the boundary points for the cognitive space. Then we reformulate the velocity field within the flow-matching process by interpolating from the velocity field of different anchors. Consequently, the generative process is driven by the velocity field and dynamically steered by multi-dimensional cognitive scores, enabling precise, fine-grained, and continuous intervention. We validate the effectiveness of CogBlender across four representative cognitive dimensions: valence, arousal, dominance, and image memorability. Extensive experiments demonstrate that our method achieves effective cognitive intervention. Our work provides an effective paradigm for cognition-driven creative design.