On The Role of Prompt Construction In Enhancing Efficacy and Efficiency of LLM-Based Tabular Data Generation

📅 2024-09-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

To address the degradation in large language model (LLM) performance during real-world tabular data generation—caused by semantically impoverished column names—this paper proposes a domain-knowledge-enhanced prompt engineering framework for the GReaT model. We systematically design three novel prompt protocols: expert-guided, LLM-guided, and novel mapping-based, explicitly injecting domain knowledge into the generative process. Through multi-strategy prompt design and rigorous empirical comparison, we demonstrate that semantic-enriched prompts simultaneously improve synthetic data quality—including column-wise distribution fidelity and row-level logical coherence—and accelerate training: convergence speed increases significantly, reducing required iterations by over 30%. This work establishes a reproducible methodology and empirically validated foundation for prompt optimization in structured data generation, advancing the integration of domain semantics into LLM-based tabular synthesis.

Technology Category

Application Category

📝 Abstract

LLM-based data generation for real-world tabular data can be challenged by the lack of sufficient semantic context in feature names used to describe columns. We hypothesize that enriching prompts with domain-specific insights can improve both the quality and efficiency of data generation. To test this hypothesis, we explore three prompt construction protocols: Expert-guided, LLM-guided, and Novel-Mapping. Through empirical studies with the recently proposed GReaT framework, we find that context-enriched prompts lead to significantly improved data generation quality and training efficiency.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM-based tabular data generation quality

Improving efficiency through context-enriched prompts

Exploring prompt construction protocols for better results

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enriching prompts with domain-specific insights

Exploring Expert-guided, LLM-guided, Novel-Mapping protocols

Using GReaT framework for improved data generation

🔎 Similar Papers

Why LLMs Are Bad at Synthetic Table Generation (and what to do about it)