Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the lack of standardized SPARQL query annotations in academic knowledge graphs by proposing a zero-shot natural language to SPARQL query generation method. Built upon the Qwen3-1.7B language model, the approach integrates symbolic prompting with an execution feedback mechanism and introduces Group-Relative Policy Optimization (GRPO) to this task for the first time. The model is trained via reinforcement learning that combines structural constraints with answer-level rewards. Experimental results demonstrate that GRPO significantly outperforms zero-shot baselines and exhibits strong template generalization capabilities. Further improvements in overall accuracy are achieved when GRPO is combined with DoRA fine-tuning at the same model scale. Ablation studies confirm that execution feedback is a critical factor driving performance gains.

📝 Abstract

Knowledge graph question answering seeks to translate natural language questions into executable queries over knowledge graphs, but existing approaches often rely on large models or full supervision in the form of gold query annotations. This study examines whether reinforcement learning with outcome-based rewards can train a small instruction-tuned language model to perform zero-shot Text-to-SPARQL generation in the scholarly domain. Group-Relative Policy Optimization (GRPO) is applied to the Qwen3-1.7B model on DBLP-QuAD, using prompts that combine natural language questions with symbolic hints about entities and relations. Training relies on execution feedback, structural constraints, and answer-level rewards, with an additional variant that incorporates gold-query-based shaping. The resulting models are compared to the unmodified zero-shot baseline and to a supervised DoRA-finetuned baseline across answer-level accuracy, execution accuracy, category-wise scores, and generalization to held-out templates. GRPO substantially improves over the zero-shot baseline and exhibits competitive generalization, while supervised DoRA finetuning achieves higher overall accuracy on the same model scale. Ablation analyses indicate that execution-based rewards account for most gains, with additional shaping yielding limited additional benefit, suggesting that outcome-based reinforcement learning is a viable training strategy when gold queries are unavailable for token-level supervision.

Problem

Research questions and friction points this paper is trying to address.

Text-to-SPARQL

knowledge graph question answering

reinforcement learning

zero-shot generation

scholarly domain

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-to-SPARQL

Reinforcement Learning

GRPO