🤖 AI Summary
Existing SPARQL generation datasets rely heavily on syntactic templates, causing models to learn only shallow, surface-level mappings between natural language questions and SPARQL queries—resulting in poor generalization to paraphrased or template-unseen inputs. To address this, we propose FRASE, a semantic enhancement framework that introduces Frame Semantic Role Labeling (FSRL) to SPARQL generation for the first time. We construct LC-QuAD 3.0, the first frame-augmented dataset, enabling deep semantic alignment between questions and queries via frame detection, argument mapping, and LLM fine-tuning. Our approach significantly improves robustness: SPARQL exact-match accuracy increases consistently by 12.7–18.3% on unseen templates and natural paraphrases. This demonstrates that structured semantic representations—grounded in linguistic frames—are critical for enhancing generalization in semantic parsing tasks.
📝 Abstract
Translating natural language questions into SPARQL queries enables Knowledge Base querying for factual and up-to-date responses. However, existing datasets for this task are predominantly template-based, leading models to learn superficial mappings between question and query templates rather than developing true generalization capabilities. As a result, models struggle when encountering naturally phrased, template-free questions. This paper introduces FRASE (FRAme-based Semantic Enhancement), a novel approach that leverages Frame Semantic Role Labeling (FSRL) to address this limitation. We also present LC-QuAD 3.0, a new dataset derived from LC-QuAD 2.0, in which each question is enriched using FRASE through frame detection and the mapping of frame-elements to their argument. We evaluate the impact of this approach through extensive experiments on recent large language models (LLMs) under different fine-tuning configurations. Our results demonstrate that integrating frame-based structured representations consistently improves SPARQL generation performance, particularly in challenging generalization scenarios when test questions feature unseen templates (unknown template splits) and when they are all naturally phrased (reformulated questions).