Transforming Podcast Preview Generation: From Expert Models to LLM-Based Systems

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency and heavy reliance on manual feature engineering in generating previews for long-form audio (e.g., podcasts), this paper proposes the first large language model (LLM)-driven, end-to-end preview generation system designed for million-scale real-time deployment. Departing from conventional multi-expert architectures, our method jointly models prompt engineering, semantic summarization, and key-segment extraction to automatically transform raw speech transcripts into high-information-density previews. Its core innovation lies in the first large-scale application of LLMs to long-audio preview generation, substantially reducing feature engineering overhead while improving semantic coherence, contextual completeness, and user interest alignment. Offline evaluations demonstrate consistent superiority over expert-designed model baselines; online A/B testing shows a 4.6% increase in user engagement rate and a fivefold improvement in processing throughput.

Technology Category

Application Category

📝 Abstract
Discovering and evaluating long-form talk content such as videos and podcasts poses a significant challenge for users, as it requires a considerable time investment. Previews offer a practical solution by providing concise snippets that showcase key moments of the content, enabling users to make more informed and confident choices. We propose an LLM-based approach for generating podcast episode previews and deploy the solution at scale, serving hundreds of thousands of podcast previews in a real-world application. Comprehensive offline evaluations and online A/B testing demonstrate that LLM-generated previews consistently outperform a strong baseline built on top of various ML expert models, showcasing a significant reduction in the need for meticulous feature engineering. The offline results indicate notable enhancements in understandability, contextual clarity, and interest level, and the online A/B test shows a 4.6% increase in user engagement with preview content, along with a 5x boost in processing efficiency, offering a more streamlined and performant solution compared to the strong baseline of feature-engineered expert models.
Problem

Research questions and friction points this paper is trying to address.

Challenges in discovering long-form talk content efficiently
Need for concise previews to aid user decision-making
Improving preview quality and efficiency with LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based podcast preview generation system
Reduces need for feature engineering
Boosts user engagement and efficiency
🔎 Similar Papers
No similar papers found.
W
Winstead Zhu
Spotify
Ann Clifton
Ann Clifton
Senior Research Scientist, Spotify
Natural Language ProcessingMachine LearningComputational LinguisticsSequence-to-sequence Models
A
Azin Ghazimatin
Spotify
E
Edgar Tanaka
Spotify
W
Ward Ronan
Spotify