🤖 AI Summary
This work addresses the lack of effective evaluation methods for large language models (LLMs) in reasoning over massive, partially observable graphs. To this end, we introduce EstGraph, a benchmark dataset designed for real-world-scale graphs, along with four global graph property estimation tasks. We innovatively combine random walk sampling with prompt engineering to construct informative inputs under strict context length constraints, enabling LLMs to infer global graph properties from limited local observations. Experimental results demonstrate that our approach significantly enhances LLMs’ graph reasoning capabilities across multiple million-node graphs, effectively overcoming the limitations imposed by finite context windows and establishing a new paradigm for evaluating LLMs on large-scale graph-structured data.
📝 Abstract
With the rapidly improving reasoning abilities of Large Language Models (LLMs), there is also a rising demand to use them in a wide variety of domains. This brings about the need to carefully evaluate the limits of the capabilities of these models with various tests and benchmarks. Graph structures are ubiquitous in real-world data, and are often used to represent and analyze relationship patterns within data. Many benchmarks have already been proposed in the graph literature to test the reasoning ability of LLMs to follow and execute graph algorithms. However, due to the limited context length of LLMs, these benchmarks consist of very small graphs. In real-world data, the size of graphs can be significantly larger, and in many cases, not fully accessible. In this paper, we examine a class of problems that arises with very large graphs having limited accessibility. We propose a large graph benchmark dataset, EstGraph, and introduce four distinct tasks designed to estimate large graph properties. We evaluate the reasoning abilities of LLMs on these tasks using a wide variety of graph datasets. In addition, we provide task-specific prompt constructions based on random walk sampling of large graphs (up to millions of nodes) that effectively convey sufficient information to LLMs within the limits of context length.