A Fully Automated Pipeline for Conversational Discourse Annotation: Tree Scheme Generation and Labeling with Large Language Models

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This study addresses the labor-intensive, expert-dependent nature of manually designing tree-structured dialogue act annotation schemes. We propose the first end-to-end automated annotation framework that jointly learns the annotation schema and labels discourse acts. Methodologically, we leverage large language models (LLMs) to generate interpretable, hierarchical annotation taxonomies, and integrate frequency-guided decision tree modeling to achieve fully automatic, fine-grained annotation of discourse functions—including SWBD-DAMSL and speech functions (SF). Our core contribution is the first complete decoupling and automation of schema construction and instance labeling, preserving structural interpretability while achieving competitive annotation accuracy. Experiments on standard benchmarks demonstrate that the automatically generated taxonomy surpasses human-designed counterparts in quality; annotation accuracy matches or exceeds expert performance; and throughput improves significantly. All code, generated taxonomies, and annotations are publicly released.

Technology Category

Application Category

📝 Abstract

Recent advances in Large Language Models (LLMs) have shown promise in automating discourse annotation for conversations. While manually designing tree annotation schemes significantly improves annotation quality for humans and models, their creation remains time-consuming and requires expert knowledge. We propose a fully automated pipeline that uses LLMs to construct such schemes and perform annotation. We evaluate our approach on speech functions (SFs) and the Switchboard-DAMSL (SWBD-DAMSL) taxonomies. Our experiments compare various design choices, and we show that frequency-guided decision trees, paired with an advanced LLM for annotation, can outperform previously manually designed trees and even match or surpass human annotators while significantly reducing the time required for annotation. We release all code and resultant schemes and annotations to facilitate future research on discourse annotation.

Problem

Research questions and friction points this paper is trying to address.

Automate discourse annotation for conversations using LLMs

Reduce time and expertise needed for tree scheme creation

Improve annotation accuracy to match or surpass human performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline using LLMs for annotation

Frequency-guided decision trees improve performance

Outperforms manual trees and matches human annotators

🔎 Similar Papers

No similar papers found.