Transforming Podcast Preview Generation: From Expert Models to LLM-Based Systems

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address the inefficiency and heavy reliance on manual feature engineering in generating previews for long-form audio (e.g., podcasts), this paper proposes the first large language model (LLM)-driven, end-to-end preview generation system designed for million-scale real-time deployment. Departing from conventional multi-expert architectures, our method jointly models prompt engineering, semantic summarization, and key-segment extraction to automatically transform raw speech transcripts into high-information-density previews. Its core innovation lies in the first large-scale application of LLMs to long-audio preview generation, substantially reducing feature engineering overhead while improving semantic coherence, contextual completeness, and user interest alignment. Offline evaluations demonstrate consistent superiority over expert-designed model baselines; online A/B testing shows a 4.6% increase in user engagement rate and a fivefold improvement in processing throughput.

Technology Category

Application Category

📝 Abstract

Discovering and evaluating long-form talk content such as videos and podcasts poses a significant challenge for users, as it requires a considerable time investment. Previews offer a practical solution by providing concise snippets that showcase key moments of the content, enabling users to make more informed and confident choices. We propose an LLM-based approach for generating podcast episode previews and deploy the solution at scale, serving hundreds of thousands of podcast previews in a real-world application. Comprehensive offline evaluations and online A/B testing demonstrate that LLM-generated previews consistently outperform a strong baseline built on top of various ML expert models, showcasing a significant reduction in the need for meticulous feature engineering. The offline results indicate notable enhancements in understandability, contextual clarity, and interest level, and the online A/B test shows a 4.6% increase in user engagement with preview content, along with a 5x boost in processing efficiency, offering a more streamlined and performant solution compared to the strong baseline of feature-engineered expert models.

Problem

Research questions and friction points this paper is trying to address.

Challenges in discovering long-form talk content efficiently

Need for concise previews to aid user decision-making

Improving preview quality and efficiency with LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based podcast preview generation system

Reduces need for feature engineering

Boosts user engagement and efficiency

🔎 Similar Papers

A Large Language Model Guided Topic Refinement Mechanism for Short Text Modeling