🤖 AI Summary
Data-to-text generation suffers from high redundancy and low lexical/semantic diversity, particularly in music marketing applications. Method: We propose a quality-diversity-balanced generation framework leveraging large language models—including T5, GPT-3.5/4, and LLaMA2—integrated with fine-tuning, few-shot, and zero-shot prompting strategies. Contribution/Results: We introduce JaccDiv, the first diversity metric quantifying lexical overlap across text collections via Jaccard similarity, and establish the first marketing-text-specific diversity evaluation benchmark. Experiments show our method improves JaccDiv by 37% over baselines while maintaining human-evaluated generation quality at ≥4.2/5.0. The framework demonstrates cross-domain generalizability and provides a reusable technical pipeline and standardized evaluation protocol for redundancy-prone content generation across diverse application domains.
📝 Abstract
Online platforms are increasingly interested in using Data-to-Text technologies to generate content and help their users. Unfortunately, traditional generative methods often fall into repetitive patterns, resulting in monotonous galleries of texts after only a few iterations. In this paper, we investigate LLM-based data-to-text approaches to automatically generate marketing texts that are of sufficient quality and diverse enough for broad adoption. We leverage Language Models such as T5, GPT-3.5, GPT-4, and LLaMa2 in conjunction with fine-tuning, few-shot, and zero-shot approaches to set a baseline for diverse marketing texts. We also introduce a metric JaccDiv to evaluate the diversity of a set of texts. This research extends its relevance beyond the music industry, proving beneficial in various fields where repetitive automated content generation is prevalent.