MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Existing concept customization methods rely on rare tokens, making it difficult to stably associate visual concepts with their underlying knowledge and resulting in inconsistent generation quality. This work introduces a novel task termed "knowledge-aware concept customization" and proposes the MoKus framework, which achieves high-fidelity personalized generation through a two-stage cross-modal knowledge transfer process: first learning an anchor visual representation of the target concept, then injecting retrieved textual knowledge into this anchor. We further establish KnowCusBench, the first benchmark for this task, which supports extended applications such as virtual concept creation and concept erasure. Experiments demonstrate that MoKus significantly outperforms existing methods on KnowCusBench and also improves performance on general-world knowledge benchmarks, confirming its strong generalization and effectiveness.

Technology Category

Application Category

📝 Abstract

Concept customization typically binds rare tokens to a target concept. Unfortunately, these approaches often suffer from unstable performance as the pretraining data seldom contains these rare tokens. Meanwhile, these rare tokens fail to convey the inherent knowledge of the target concept. Consequently, we introduce Knowledge-aware Concept Customization, a novel task aiming at binding diverse textual knowledge to target visual concepts. This task requires the model to identify the knowledge within the text prompt to perform high-fidelity customized generation. Meanwhile, the model should efficiently bind all the textual knowledge to the target concept. Therefore, we propose MoKus, a novel framework for knowledge-aware concept customization. Our framework relies on a key observation: cross-modal knowledge transfer, where modifying knowledge within the text modality naturally transfers to the visual modality during generation. Inspired by this observation, MoKus contains two stages: (1) In visual concept learning, we first learn the anchor representation to store the visual information of the target concept. (2) In textual knowledge updating, we update the answer for the knowledge queries to the anchor representation, enabling high-fidelity customized generation. To further comprehensively evaluate our proposed MoKus on the new task, we introduce the first benchmark for knowledge-aware concept customization: KnowCusBench. Extensive evaluations have demonstrated that MoKus outperforms state-of-the-art methods. Moreover, the cross-model knowledge transfer allows MoKus to be easily extended to other knowledge-aware applications like virtual concept creation and concept erasure. We also demonstrate the capability of our method to achieve improvements on world knowledge benchmarks.

Problem

Research questions and friction points this paper is trying to address.

concept customization

knowledge transfer

cross-modal

text-to-image generation

visual concepts

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal knowledge transfer

knowledge-aware concept customization

MoKus