Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the limited generalization of existing knowledge graph reasoning methods in multimodal settings, particularly their poor transferability to unseen graphs. To overcome this challenge, the authors propose TOFU, a foundational model for multimodal knowledge graphs that, for the first time, discretizes structural, visual, and textual information into fine-grained, modality-specific, transferable tokens. TOFU employs a hierarchical fusion architecture and a hybrid message-passing mechanism to enable robust cross-graph reasoning. Evaluated across 17 benchmarks covering transductive, inductive, and fully inductive settings, TOFU substantially outperforms current state-of-the-art methods, demonstrating its exceptional ability to generalize across diverse and previously unseen knowledge graphs.

Technology Category

Application Category

📝 Abstract

Multi-modal knowledge graph reasoning (MMKGR) aims to predict the missing links by exploiting both graph structure information and multi-modal entity contents. Most existing works are designed for a transductive setting, which learns dataset-specific embeddings and struggles to generalize to new KGs. Recent knowledge graph foundation models (KGFMs) improve cross-KG transfer, but they mainly exploit structural patterns and ignore rich multi-modal signals. We address these gaps by proposing a token-based foundation model (TOFU) for MMKGR, which exhibits strong generalization across different MMKGs. TOFU discretizes structural, visual, and textual information into modality-specific tokens. TOFU then employs a hierarchical fusion architecture with mixture-of-message mechanisms, aiming to process these tokens and obtain transferable features for MMKGR. Experimental results on 17 transductive, inductive, and fully-inductive MMKGs show that TOFU consistently outperforms strong KGFM and MMKGR baselines, delivering strong performance on unseen MMKGs.

Problem

Research questions and friction points this paper is trying to address.

multi-modal knowledge graph reasoning

knowledge graph foundation model

cross-KG transfer

generalization

multi-modal signals

Innovation

Methods, ideas, or system contributions that make the work stand out.

token-based foundation model

multi-modal knowledge graph reasoning

transferable representation