Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual language models (VLMs) suffer from a scarcity of high-quality, low-cost chain-of-thought (CoT) data for multimodal reasoning. Method: This paper introduces the first game-code-driven paradigm for automatic visual-language reasoning data synthesis. Leveraging large language models (LLMs), we parse and rewrite game source code; executing the code then automatically extracts state-transition logic and multi-step reasoning trajectories—enabling zero-shot, human-annotation-free construction of GameQA, a dataset spanning 30 games and 158 tasks. Contribution/Results: Our approach enables scalable, cross-domain generalizable, and high-difficulty multimodal CoT data generation. Empirical evaluation demonstrates that fine-tuning Qwen2.5-VL-7B solely on GameQA yields an average +2.33% improvement across seven mainstream vision-language benchmarks—validating the efficacy of game-derived data for enhancing general visual-language understanding. The GameQA dataset is publicly released.

Technology Category

Application Category

📝 Abstract
Visual-language Chain-of-Thought (CoT) data resources are relatively scarce compared to text-only counterparts, limiting the improvement of reasoning capabilities in Vision Language Models (VLMs). However, high-quality vision-language reasoning data is expensive and labor-intensive to annotate. To address this issue, we leverage a promising resource: game code, which naturally contains logical structures and state transition processes. Therefore, we propose Code2Logic, a novel game-code-driven approach for multimodal reasoning data synthesis. Our approach leverages Large Language Models (LLMs) to adapt game code, enabling automatic acquisition of reasoning processes and results through code execution. Using the Code2Logic approach, we developed the GameQA dataset to train and evaluate VLMs. GameQA is cost-effective and scalable to produce, challenging for state-of-the-art models, and diverse with 30 games and 158 tasks. Surprisingly, despite training solely on game data, VLMs demonstrated out of domain generalization, specifically Qwen2.5-VL-7B improving performance by 2.33% across 7 diverse vision-language benchmarks. Our code and dataset are available at https://github.com/tongjingqi/Code2Logic.
Problem

Research questions and friction points this paper is trying to address.

Scarce visual-language CoT data limits VLM reasoning
Game code provides logical structures for data synthesis
Code2Logic generates scalable multimodal reasoning data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages game code for multimodal reasoning data synthesis
Uses LLMs to adapt game code for automatic reasoning
Develops scalable GameQA dataset for VLM training
🔎 Similar Papers
No similar papers found.
J
Jingqi Tong
Computation and Arificial intelligence innovative College, Fudan University
J
Jixin Tang
Computation and Arificial intelligence innovative College, Fudan University
H
Hangcheng Li
Computation and Arificial intelligence innovative College, Fudan University
Y
Yurong Mou
Computation and Arificial intelligence innovative College, Fudan University
M
Ming Zhang
Computation and Arificial intelligence innovative College, Fudan University
J
Jun Zhao
Computation and Arificial intelligence innovative College, Fudan University
Y
Yanbo Wen
Computation and Arificial intelligence innovative College, Fudan University
F
Fan Song
Computation and Arificial intelligence innovative College, Fudan University
J
Jiahao Zhan
Computation and Arificial intelligence innovative College, Fudan University
Y
Yuyang Lu
Computation and Arificial intelligence innovative College, Fudan University
C
Chaoran Tao
Computation and Arificial intelligence innovative College, Fudan University
Zhiyuan Guo
Zhiyuan Guo
University of California San Diego
J
Jizhou Yu
Computation and Arificial intelligence innovative College, Fudan University
Tianhao Cheng
Tianhao Cheng
Fudan University
Large Language Model
C
Changhao Jiang
Computation and Arificial intelligence innovative College, Fudan University
Z
Zhen Wang
Douyin Co., Ltd.
T
Tao Liang
Douyin Co., Ltd.
Z
Zhihui Fei
Douyin Co., Ltd.
M
Ming-Xi Wan
Douyin Co., Ltd.
G
Guojun Ma
Douyin Co., Ltd.
Weifeng Ge
Weifeng Ge
Fudan University
Humanoid RobotComputer VisionArtificial IntelligenceAI4Science
G
Guanhua Chen
Southern University of Science and Technology
T
Tao Gui
Institute of Modern Languages, Fudan University
X
Xipeng Qiu
Computation and Arificial intelligence innovative College, Fudan University, Shanghai Innovation Institute
Q
Qi Zhang
Computation and Arificial intelligence innovative College, Fudan University, Shanghai Key Laboratory of Intelligent Information Processing
X
Xuanjing Huang
Computation and Arificial intelligence innovative College, Fudan University, Shanghai Key Laboratory of Intelligent Information Processing