🤖 AI Summary
This work addresses the lack of structured hierarchy in speech-to-text transcripts by proposing a multilevel topic segmentation method that integrates prosodic pause features with LoRA-finetuned large language models to automatically generate hierarchical outlines comprising topics and subtopics. It introduces LoRA-based fine-tuning for the first time to the task of multilevel transcript segmentation, designs a unified evaluation metric tailored to hierarchical structure, and enhances boundary detection accuracy through the incorporation of speech pause information. The approach significantly outperforms existing baselines on English meeting corpora as well as Portuguese and German lecture datasets, demonstrating strong effectiveness and cross-lingual generalization across diverse scenarios.
📝 Abstract
Segmenting speech transcripts into thematic sections benefits both downstream processing and users who depend on written text for accessibility. We introduce a novel approach to hierarchical topic segmentation in transcripts, generating multi-level tables of contents that capture both topic and subtopic boundaries. We compare zero-shot prompting and LoRA fine-tuning on large language models, while also exploring the integration of high-level speech pause features. Evaluations on English meeting recordings and multilingual lecture transcripts (Portuguese, German) show significant improvements over established topic segmentation baselines. Additionally, we adapt a common evaluation measure for multi-level segmentation, taking into account all hierarchical levels within one metric.