On The Role of Prompt Construction In Enhancing Efficacy and Efficiency of LLM-Based Tabular Data Generation

📅 2024-09-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the degradation in large language model (LLM) performance during real-world tabular data generation—caused by semantically impoverished column names—this paper proposes a domain-knowledge-enhanced prompt engineering framework for the GReaT model. We systematically design three novel prompt protocols: expert-guided, LLM-guided, and novel mapping-based, explicitly injecting domain knowledge into the generative process. Through multi-strategy prompt design and rigorous empirical comparison, we demonstrate that semantic-enriched prompts simultaneously improve synthetic data quality—including column-wise distribution fidelity and row-level logical coherence—and accelerate training: convergence speed increases significantly, reducing required iterations by over 30%. This work establishes a reproducible methodology and empirically validated foundation for prompt optimization in structured data generation, advancing the integration of domain semantics into LLM-based tabular synthesis.

Technology Category

Application Category

📝 Abstract
LLM-based data generation for real-world tabular data can be challenged by the lack of sufficient semantic context in feature names used to describe columns. We hypothesize that enriching prompts with domain-specific insights can improve both the quality and efficiency of data generation. To test this hypothesis, we explore three prompt construction protocols: Expert-guided, LLM-guided, and Novel-Mapping. Through empirical studies with the recently proposed GReaT framework, we find that context-enriched prompts lead to significantly improved data generation quality and training efficiency.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM-based tabular data generation quality
Improving efficiency through context-enriched prompts
Exploring prompt construction protocols for better results
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enriching prompts with domain-specific insights
Exploring Expert-guided, LLM-guided, Novel-Mapping protocols
Using GReaT framework for improved data generation
🔎 Similar Papers
No similar papers found.
B
Banooqa H. Banday
Texas State University
Kowshik Thopalli
Kowshik Thopalli
Ph.D. Student, Arizona State University
computer visionmachine learningdeep learningartificial intelligencedifferential geometry
T
T. Islam
Texas State University
J
J. Thiagarajan
Lawrence Livermore National Laboratory