TalkLess: Blending Extractive and Abstractive Speech Summarization for Editing Speech to Preserve Content and Style

📅 2025-07-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Speech summarization faces dual challenges: preserving content fidelity while retaining speaker-specific acoustic characteristics—extractive methods suffer from limited expressiveness, whereas abstractive approaches often degrade prosodic and phonetic features. This paper introduces the first hybrid speech editing framework integrating extractive and abstractive summarization, enabled by a tunable dual-panel interface that decouples word-level speech editing from content-level text summarization, thereby enabling dynamic trade-offs among compression ratio, content coverage, and audio quality. The system integrates ASR, controllable text summarization, TTS, and speech editing models, augmented by an optimal editing path selection strategy. Experiments show a 23.6% improvement in content coverage and a 31.4% reduction in word error rate over extractive baselines. User studies confirm significant reductions in cognitive load and editing time; practitioner evaluations further validate its practical utility and effectiveness.

Technology Category

Application Category

📝 Abstract
Millions of people listen to podcasts, audio stories, and lectures, but editing speech remains tedious and time-consuming. Creators remove unnecessary words, cut tangential discussions, and even re-record speech to make recordings concise and engaging. Prior work automatically summarized speech by removing full sentences (extraction), but rigid extraction limits expressivity. AI tools can summarize then re-synthesize speech (abstraction), but abstraction strips the speaker's style. We present TalkLess, a system that flexibly combines extraction and abstraction to condense speech while preserving its content and style. To edit speech, TalkLess first generates possible transcript edits, selects edits to maximize compression, coverage, and audio quality, then uses a speech editing model to translate transcript edits into audio edits. TalkLess's interface provides creators control over automated edits by separating low-level wording edits (via the compression pane) from major content edits (via the outline pane). TalkLess achieves higher coverage and removes more speech errors than a state-of-the-art extractive approach. A comparison study (N=12) showed that TalkLess significantly decreased cognitive load and editing effort in speech editing. We further demonstrate TalkLess's potential in an exploratory study (N=3) where creators edited their own speech.
Problem

Research questions and friction points this paper is trying to address.

Combines extraction and abstraction to summarize speech
Preserves speaker's style while condensing content
Reduces cognitive load and editing effort
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines extraction and abstraction flexibly
Maximizes compression, coverage, audio quality
Provides control via separate editing panes
🔎 Similar Papers
No similar papers found.