Spatial Knowledge Graph-Guided Multimodal Synthesis

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited spatial reasoning capability of multimodal large language models (MLLMs), this paper proposes a spatial knowledge graph (SKG)-guided synthetic paradigm, introducing the first “knowledge-to-data” generation framework. First, an SKG encoding commonsense spatial relations—such as direction and distance—is automatically constructed. Then, an SKG-conditioned diffusion model is designed, augmented with a multi-granularity spatial constraint loss to ensure controllable image-text pair generation aligned with spatial semantics. This approach explicitly integrates structured spatial knowledge into the multimodal synthesis pipeline, enhancing both interpretability and controllability. Evaluated on spatial reasoning benchmarks—including SpatioQA and GeoVQA—the synthesized data improves MLLM spatial accuracy by an average of 12.6%, while demonstrating robust generalization to unseen spatial configurations.

Technology Category

Application Category

📝 Abstract
Recent advances in multimodal large language models (MLLMs) have significantly enhanced their capabilities; however, their spatial perception abilities remain a notable limitation. To address this challenge, multimodal data synthesis offers a promising solution. Yet, ensuring that synthesized data adhere to spatial common sense is a non-trivial task. In this work, we introduce SKG2Data, a novel multimodal synthesis approach guided by spatial knowledge graphs, grounded in the concept of knowledge-to-data generation. SKG2Data automatically constructs a Spatial Knowledge Graph (SKG) to emulate human-like perception of spatial directions and distances, which is subsequently utilized to guide multimodal data synthesis. Extensive experiments demonstrate that data synthesized from diverse types of spatial knowledge, including direction and distance, not only enhance the spatial perception and reasoning abilities of MLLMs but also exhibit strong generalization capabilities. We hope that the idea of knowledge-based data synthesis can advance the development of spatial intelligence.
Problem

Research questions and friction points this paper is trying to address.

Enhancing spatial perception in MLLMs using knowledge-guided synthesis
Ensuring synthesized data adheres to spatial common sense constraints
Improving MLLMs' generalization via diverse spatial knowledge integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial Knowledge Graph guides multimodal synthesis
Automated SKG emulates human spatial perception
SKG2Data enhances MLLMs spatial reasoning
🔎 Similar Papers
No similar papers found.
Y
Yida Xue
Zhejiang University
Zhen Bi
Zhen Bi
Zhejiang University, Huzhou University
Knowledge GraphLanguage ModelOn-device LLM
J
Jinnan Yang
Huzhou University
J
Jungang Lou
Huzhou University
H
Huajun Chen
Zhejiang University
Ningyu Zhang
Ningyu Zhang
Ph.D. Student, Vanderbilt University
artificial intelligencelearning analyticslearning environments