JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry

📅 2025-04-29

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Data-to-text generation suffers from high redundancy and low lexical/semantic diversity, particularly in music marketing applications. Method: We propose a quality-diversity-balanced generation framework leveraging large language models—including T5, GPT-3.5/4, and LLaMA2—integrated with fine-tuning, few-shot, and zero-shot prompting strategies. Contribution/Results: We introduce JaccDiv, the first diversity metric quantifying lexical overlap across text collections via Jaccard similarity, and establish the first marketing-text-specific diversity evaluation benchmark. Experiments show our method improves JaccDiv by 37% over baselines while maintaining human-evaluated generation quality at ≥4.2/5.0. The framework demonstrates cross-domain generalizability and provides a reusable technical pipeline and standardized evaluation protocol for redundancy-prone content generation across diverse application domains.

Technology Category

Application Category

📝 Abstract

Online platforms are increasingly interested in using Data-to-Text technologies to generate content and help their users. Unfortunately, traditional generative methods often fall into repetitive patterns, resulting in monotonous galleries of texts after only a few iterations. In this paper, we investigate LLM-based data-to-text approaches to automatically generate marketing texts that are of sufficient quality and diverse enough for broad adoption. We leverage Language Models such as T5, GPT-3.5, GPT-4, and LLaMa2 in conjunction with fine-tuning, few-shot, and zero-shot approaches to set a baseline for diverse marketing texts. We also introduce a metric JaccDiv to evaluate the diversity of a set of texts. This research extends its relevance beyond the music industry, proving beneficial in various fields where repetitive automated content generation is prevalent.

Problem

Research questions and friction points this paper is trying to address.

Measure diversity in generated marketing texts for music industry

Prevent repetitive patterns in automated content generation

Evaluate text diversity using new metric JaccDiv

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs like T5, GPT-3.5, GPT-4, and LLaMa2

Combining fine-tuning, few-shot, and zero-shot approaches

Introducing JaccDiv metric to evaluate text diversity

🔎 Similar Papers

Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores