Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation

📅 2025-08-17
🏛️ Interspeech
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of structured hierarchy in speech-to-text transcripts by proposing a multilevel topic segmentation method that integrates prosodic pause features with LoRA-finetuned large language models to automatically generate hierarchical outlines comprising topics and subtopics. It introduces LoRA-based fine-tuning for the first time to the task of multilevel transcript segmentation, designs a unified evaluation metric tailored to hierarchical structure, and enhances boundary detection accuracy through the incorporation of speech pause information. The approach significantly outperforms existing baselines on English meeting corpora as well as Portuguese and German lecture datasets, demonstrating strong effectiveness and cross-lingual generalization across diverse scenarios.

Technology Category

Application Category

📝 Abstract
Segmenting speech transcripts into thematic sections benefits both downstream processing and users who depend on written text for accessibility. We introduce a novel approach to hierarchical topic segmentation in transcripts, generating multi-level tables of contents that capture both topic and subtopic boundaries. We compare zero-shot prompting and LoRA fine-tuning on large language models, while also exploring the integration of high-level speech pause features. Evaluations on English meeting recordings and multilingual lecture transcripts (Portuguese, German) show significant improvements over established topic segmentation baselines. Additionally, we adapt a common evaluation measure for multi-level segmentation, taking into account all hierarchical levels within one metric.
Problem

Research questions and friction points this paper is trying to address.

transcript segmentation
topic segmentation
multi-level segmentation
table-of-contents generation
hierarchical segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA fine-tuning
multi-level segmentation
table-of-contents generation
speech pause features
hierarchical topic segmentation
🔎 Similar Papers
No similar papers found.
S
Steffen Freisinger
Technische Hochschule Nürnberg, Germany
P
Philipp Seeberger
Technische Hochschule Nürnberg, Germany
T
Thomas Ranzenberger
Technische Hochschule Nürnberg, Germany
Tobias Bocklet
Tobias Bocklet
Technische Hochschule Nürnberg & Intel Labs
Automatic Speech ProcessingMachine LearningDeep LearningArtificial Intelligence
K
K. Riedhammer
Technische Hochschule Nürnberg, Germany