Learning LLM Preference over Intra-Dialogue Pairs: A Framework for Utterance-level Understandings

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address the high label noise in LLM-generated annotations and the performance limitations of fine-tuning small models for real-time dialogue understanding, this paper proposes a noise-robust intra-dialogue utterance-pair preference learning framework. We formulate LLM-based preference learning as fine-grained ranking of utterance pairs within the same dialogue and design a novel loss function to suppress label noise propagation. By integrating LLM label distillation with supervised fine-tuning on compact models, our method achieves absolute accuracy improvements of 2.0% on sentiment detection and 1.5% on dialogue act classification—outperforming conventional distillation and supervised fine-tuning baselines. Crucially, it reconciles low-latency inference (enabled by lightweight models) with high accuracy (approaching LLM-level performance), establishing a new paradigm for efficient, noise-resilient dialogue understanding.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in handling complex dialogue tasks without requiring use case-specific fine-tuning. However, analyzing live dialogues in real-time necessitates low-latency processing systems, making it impractical to deploy models with billions of parameters due to latency constraints. As a result, practitioners often prefer smaller models with millions of parameters, trained on high-quality, human-annotated datasets. Yet, curating such datasets is both time-consuming and costly. Consequently, there is a growing need to combine the scalability of LLM-generated labels with the precision of human annotations, enabling fine-tuned smaller models to achieve both higher speed and accuracy comparable to larger models. In this paper, we introduce a simple yet effective framework to address this challenge. Our approach is specifically designed for per-utterance classification problems, which encompass tasks such as intent detection, dialogue state tracking, and more. To mitigate the impact of labeling errors from LLMs -- the primary source of inaccuracies in student models -- we propose a noise-reduced preference learning loss. Experimental results demonstrate that our method significantly improves accuracy across utterance-level dialogue tasks, including sentiment detection (over $2%$), dialogue act classification (over $1.5%$), etc.

Problem

Research questions and friction points this paper is trying to address.

Real-time dialogue analysis requires low-latency systems.

Smaller models need high-quality, human-annotated datasets for accuracy.

LLM-generated labels combined with human annotations improve model performance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework for utterance-level dialogue understanding

Noise-reduced preference learning loss

Combines LLM scalability with human annotation precision

🔎 Similar Papers

No similar papers found.