GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This paper addresses the lack of systematic evaluation of large language models’ (LLMs) graph reasoning capabilities by introducing GraphOmni—the first comprehensive benchmark for this task. Methodologically, it establishes a multidimensional evaluation framework covering diverse graph structures, serialization schemes (e.g., DFS, BFS, adjacency lists), and prompting strategies; it further proposes a PPO-based reinforcement learning approach for dynamic, adaptive pairing of serialization methods and prompts. Key contributions include: (1) the first fine-grained, reproducible assessment of LLMs’ graph reasoning; (2) empirical identification of critical limitations in structural awareness, long-range dependency modeling, and format robustness; and (3) an open-source, modular, and extensible evaluation infrastructure that substantially improves accuracy on graph tasks—laying foundational groundwork for developing general-purpose graph reasoning models.

Technology Category

Application Category

📝 Abstract

In this paper, we presented GraphOmni, a comprehensive benchmark framework for systematically evaluating the graph reasoning capabilities of LLMs. By analyzing critical dimensions, including graph types, serialization formats, and prompt schemes, we provided extensive insights into the strengths and limitations of current LLMs. Our empirical findings emphasize that no single serialization or prompting strategy consistently outperforms others. Motivated by these insights, we propose a reinforcement learning-based approach that dynamically selects the best serialization-prompt pairings, resulting in significant accuracy improvements. GraphOmni's modular and extensible design establishes a robust foundation for future research, facilitating advancements toward general-purpose graph reasoning models.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' graph reasoning capabilities comprehensively

Analyzing graph types, serialization formats, and prompt schemes

Dynamic selection of best serialization-prompt pairs via RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic serialization-prompt selection via reinforcement learning

Modular design for extensible graph reasoning benchmarks

Comprehensive evaluation across diverse graph task dimensions

🔎 Similar Papers

How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension