LLMTM: Benchmarking and Optimizing LLMs for Temporal Motif Analysis in Dynamic Graphs

📅 2025-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work has not systematically investigated large language models’ (LLMs) capabilities in temporal motif analysis on dynamic graphs, nor established dedicated benchmarks or optimization methods for this task. Method: We introduce LLMTM—the first LLM-specific benchmark for dynamic graph temporal motif analysis—comprising nine motif types and six analytical tasks. We further propose a tool-augmented LLM agent to enhance analytical accuracy and a structure-aware scheduler to reduce inference overhead while preserving high accuracy. Contribution/Results: Extensive evaluation of nine state-of-the-art LLMs (e.g., Qwen, DeepSeek, GPT-4o-mini) on LLMTM demonstrates that our agent achieves SOTA accuracy; the scheduler reduces average inference cost by 42.3% with negligible accuracy degradation (<1.2%). This work establishes the first LLM evaluation paradigm for dynamic graph temporal motif analysis and presents a performance-efficiency co-optimization framework.

Technology Category

Application Category

📝 Abstract
The widespread application of Large Language Models (LLMs) has motivated a growing interest in their capacity for processing dynamic graphs. Temporal motifs, as an elementary unit and important local property of dynamic graphs which can directly reflect anomalies and unique phenomena, are essential for understanding their evolutionary dynamics and structural features. However, leveraging LLMs for temporal motif analysis on dynamic graphs remains relatively unexplored. In this paper, we systematically study LLM performance on temporal motif-related tasks. Specifically, we propose a comprehensive benchmark, LLMTM (Large Language Models in Temporal Motifs), which includes six tailored tasks across nine temporal motif types. We then conduct extensive experiments to analyze the impacts of different prompting techniques and LLMs (including nine models: openPangu-7B, the DeepSeek-R1-Distill-Qwen series, Qwen2.5-32B-Instruct, GPT-4o-mini, DeepSeek-R1, and o3) on model performance. Informed by our benchmark findings, we develop a tool-augmented LLM agent that leverages precisely engineered prompts to solve these tasks with high accuracy. Nevertheless, the high accuracy of the agent incurs a substantial cost. To address this trade-off, we propose a simple yet effective structure-aware dispatcher that considers both the dynamic graph's structural properties and the LLM's cognitive load to intelligently dispatch queries between the standard LLM prompting and the more powerful agent. Our experiments demonstrate that the structure-aware dispatcher effectively maintains high accuracy while reducing cost.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking LLMs for temporal motif analysis in dynamic graphs
Developing a tool-augmented LLM agent for accurate motif tasks
Proposing a cost-effective dispatcher to balance accuracy and expense
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking LLMs for temporal motif tasks
Developing tool-augmented LLM agent for accuracy
Using structure-aware dispatcher to balance cost and performance
🔎 Similar Papers
No similar papers found.
B
Bing Hao
School of New Media and Communication, Tianjin University, China
Minglai Shao
Minglai Shao
Tianjin University
Graph MiningDeep LearningMachine Learning
Zengyi Wo
Zengyi Wo
College of Intelligence and Computing, Tianjin University
Data MiningAnomaly DetectionLLM Reasoning
Y
Yunlong Chu
School of New Media and Communication, Tianjin University, China
Yuhang Liu
Yuhang Liu
The University of Adelaide
Representation LearningLLMsLatent Variable ModelsResponsible AI
R
Ruijie Wang
School of Computer Science and Technology, Beihang University, China