🤖 AI Summary
To address the challenge of securely querying knowledge graphs containing sensitive data using third-party large language models (LLMs) under resource-constrained conditions, this paper proposes a privacy-preserving natural language-to-Cypher query generation method. The approach automatically identifies and filters sensitive nodes via graph-structure analysis, performing lightweight input-side anonymization without requiring local LLM deployment—enabling safe invocation of cloud-based LLM services. Its core innovation lies in jointly modeling sensitive information identification, structure-aware anonymization, and LLM-driven semantic translation to simultaneously ensure privacy preservation and query accuracy. Experiments on multiple benchmark datasets demonstrate that the method maintains over 90% Cypher generation accuracy while strictly preventing leakage of sensitive entities and relationships, significantly outperforming baseline anonymization strategies.
📝 Abstract
Large Language Models (LLMs) are increasingly used to query knowledge graphs (KGs) due to their strong semantic understanding and extrapolation capabilities compared to traditional approaches. However, these methods cannot be applied when the KG contains sensitive data and the user lacks the resources to deploy a local generative LLM. To address this issue, we propose a privacy-aware query generation approach for KGs. Our method identifies sensitive information in the graph based on its structure and omits such values before requesting the LLM to translate natural language questions into Cypher queries. Experimental results show that our approach preserves the quality of the generated queries while preventing sensitive data from being transmitted to third-party services.