JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Data-to-text generation suffers from high redundancy and low lexical/semantic diversity, particularly in music marketing applications. Method: We propose a quality-diversity-balanced generation framework leveraging large language models—including T5, GPT-3.5/4, and LLaMA2—integrated with fine-tuning, few-shot, and zero-shot prompting strategies. Contribution/Results: We introduce JaccDiv, the first diversity metric quantifying lexical overlap across text collections via Jaccard similarity, and establish the first marketing-text-specific diversity evaluation benchmark. Experiments show our method improves JaccDiv by 37% over baselines while maintaining human-evaluated generation quality at ≥4.2/5.0. The framework demonstrates cross-domain generalizability and provides a reusable technical pipeline and standardized evaluation protocol for redundancy-prone content generation across diverse application domains.

Technology Category

Application Category

📝 Abstract
Online platforms are increasingly interested in using Data-to-Text technologies to generate content and help their users. Unfortunately, traditional generative methods often fall into repetitive patterns, resulting in monotonous galleries of texts after only a few iterations. In this paper, we investigate LLM-based data-to-text approaches to automatically generate marketing texts that are of sufficient quality and diverse enough for broad adoption. We leverage Language Models such as T5, GPT-3.5, GPT-4, and LLaMa2 in conjunction with fine-tuning, few-shot, and zero-shot approaches to set a baseline for diverse marketing texts. We also introduce a metric JaccDiv to evaluate the diversity of a set of texts. This research extends its relevance beyond the music industry, proving beneficial in various fields where repetitive automated content generation is prevalent.
Problem

Research questions and friction points this paper is trying to address.

Measure diversity in generated marketing texts for music industry
Prevent repetitive patterns in automated content generation
Evaluate text diversity using new metric JaccDiv
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs like T5, GPT-3.5, GPT-4, and LLaMa2
Combining fine-tuning, few-shot, and zero-shot approaches
Introducing JaccDiv metric to evaluate text diversity
🔎 Similar Papers
No similar papers found.