Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Triplet extraction for knowledge graph (KG) construction heavily relies on manually crafted prompts, which are labor-intensive and highly sensitive to minor variations in large language models (LLMs). Method: This work proposes and systematically evaluates automated prompt optimization methods—specifically DSPy, APE, and TextGrad—across SynthIE and REBEL benchmarks, analyzing the impact of model architecture, schema complexity, and input text length. Contribution/Results: Automated optimization generates prompts competitive with human-designed ones; notably, it improves extraction accuracy and robustness—especially under high-schema-complexity and long-text conditions—yielding an average F1-score gain of 8.2%. This study constitutes the first systematic validation of automated prompt optimization’s generalizability and practical utility in KG construction, establishing a new paradigm for low-human-intervention, high-stability knowledge acquisition.

Technology Category

Application Category

📝 Abstract

A KG represents a network of entities and illustrates relationships between them. KGs are used for various applications, including semantic search and discovery, reasoning, decision-making, natural language processing, machine learning, and recommendation systems. Triple (subject-relation-object) extraction from text is the fundamental building block of KG construction and has been widely studied, for example, in early benchmarks such as ACE 2002 to more recent ones, such as WebNLG 2020, REBEL and SynthIE. While the use of LLMs is explored for KG construction, handcrafting reasonable task-specific prompts for LLMs is a labour-intensive exercise and can be brittle due to subtle changes in the LLM models employed. Recent work in NLP tasks (e.g. autonomy generation) uses automatic prompt optimization/engineering to address this challenge by generating optimal or near-optimal task-specific prompts given input-output examples. This empirical study explores the application of automatic prompt optimization for the triple extraction task using experimental benchmarking. We evaluate different settings by changing (a) the prompting strategy, (b) the LLM being used for prompt optimization and task execution, (c) the number of canonical relations in the schema (schema complexity), (d) the length and diversity of input text, (e) the metric used to drive the prompt optimization, and (f) the dataset being used for training and testing. We evaluate three different automatic prompt optimizers, namely, DSPy, APE, and TextGrad and use two different triple extraction datasets, SynthIE and REBEL. Through rigorous empirical evaluation, our main contribution highlights that automatic prompt optimization techniques can generate reasonable prompts similar to humans for triple extraction. In turn, these optimized prompts achieve improved results, particularly with increasing schema complexity and text size.

Problem

Research questions and friction points this paper is trying to address.

Optimizing prompts for knowledge graph triple extraction

Reducing manual effort in LLM prompt engineering

Improving performance with complex schemas and large texts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic prompt optimization for KG construction

Evaluates multiple LLMs and prompt strategies

Improves triple extraction with optimized prompts

🔎 Similar Papers

Towards Better Benchmark Datasets for Inductive Knowledge Graph Completion