AmbiGraph-Eval: Can LLMs Effectively Handle Ambiguous Graph Queries?

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

142K/year

🤖 AI Summary

Ambiguity in natural language-to-graph query translation is exacerbated by graph structural connectivity, manifesting as attribute ambiguity, relationship ambiguity, and attribute–relationship cross-ambiguity (including intra- and inter-entity sub-scenarios). Method: We propose the first taxonomy for graph query ambiguity, construct AmbiGraph-Eval—a benchmark dataset comprising real-world queries and expert annotations—and design multi-granularity test cases integrating NLP and graph query techniques to quantitatively evaluate nine state-of-the-art LLMs. Contribution/Results: Experiments reveal severe deficiencies in semantic parsing and contextual disambiguation across current models, with average accuracy below 40%. This work establishes the first systematic evaluation paradigm for graph query ambiguity, providing both theoretical foundations and empirical evidence to guide robustness optimization of LLMs for graph data.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have recently demonstrated strong capabilities in translating natural language into database queries, especially when dealing with complex graph-structured data. However, real-world queries often contain inherent ambiguities, and the interconnected nature of graph structures can amplify these challenges, leading to unintended or incorrect query results. To systematically evaluate LLMs on this front, we propose a taxonomy of graph-query ambiguities, comprising three primary types: Attribute Ambiguity, Relationship Ambiguity, and Attribute-Relationship Ambiguity, each subdivided into Same-Entity and Cross-Entity scenarios. We introduce AmbiGraph-Eval, a novel benchmark of real-world ambiguous queries paired with expert-verified graph query answers. Evaluating 9 representative LLMs shows that even top models struggle with ambiguous graph queries. Our findings reveal a critical gap in ambiguity handling and motivate future work on specialized resolution techniques.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to handle ambiguous graph-structured natural language queries

Addressing three types of graph-query ambiguities in attribute and relationship contexts

Identifying performance gaps in LLMs when processing real-world ambiguous graph queries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes taxonomy for graph-query ambiguities

Introduces AmbiGraph-Eval benchmark dataset

Evaluates nine LLMs on ambiguous graph queries

🔎 Similar Papers

LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations