🤖 AI Summary
This work systematically evaluates the capabilities and limitations of large language models (LLMs) in graph pattern understanding—particularly graph pattern mining—a task largely unexplored for LLMs.
Method: We introduce the first dedicated benchmark comprising three task categories—terminology comprehension, topological description, and autonomous discovery—covering synthetic and real-world graph datasets, seven mainstream LLMs, and eleven subtasks. Our zero-shot and few-shot evaluation framework integrates structured graph representations with natural language prompts and adopts a multidimensional assessment paradigm that jointly measures descriptive understanding and generative capability.
Contribution/Results: Experiments reveal, for the first time, that LLMs possess preliminary graph pattern understanding ability (with O1-mini achieving best overall performance), exhibiting reasoning pathways fundamentally distinct from traditional algorithms. Performance improves substantially when input formats align with pretraining knowledge. This work establishes foundational benchmarks, methodological frameworks, and empirical evidence for interdisciplinary research at the intersection of graph AI and foundation models.
📝 Abstract
Benchmarking the capabilities and limitations of large language models (LLMs) in graph-related tasks is becoming an increasingly popular and crucial area of research. Recent studies have shown that LLMs exhibit a preliminary ability to understand graph structures and node features. However, the potential of LLMs in graph pattern mining remains largely unexplored. This is a key component in fields such as computational chemistry, biology, and social network analysis. To bridge this gap, this work introduces a comprehensive benchmark to assess LLMs' capabilities in graph pattern tasks. We have developed a benchmark that evaluates whether LLMs can understand graph patterns based on either terminological or topological descriptions. Additionally, our benchmark tests the LLMs' capacity to autonomously discover graph patterns from data. The benchmark encompasses both synthetic and real datasets, and a variety of models, with a total of 11 tasks and 7 models. Our experimental framework is designed for easy expansion to accommodate new models and datasets. Our findings reveal that: (1) LLMs have preliminary abilities to understand graph patterns, with O1-mini outperforming in the majority of tasks; (2) Formatting input data to align with the knowledge acquired during pretraining can enhance performance; (3) The strategies employed by LLMs may differ from those used in conventional algorithms.