Data Selection for Multi-turn Dialogue Instruction Tuning

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses common issues in multi-turn dialogue instruction-tuning data—such as topic drift, redundant chit-chat, and inconsistent question-answer formatting—by proposing the MDS framework, which performs data selection at the dialogue level for the first time. MDS samples dialogues via binning in the user query trajectory space and jointly evaluates topical coherence (based on entity grounding), informational progression, and question-answer format consistency, thereby achieving high-quality sample selection that balances global coverage with local structural fidelity. Experimental results demonstrate that MDS consistently outperforms single-turn selectors, LLM-based scorers, and heuristic baselines across three general-purpose benchmarks and a banking-domain test set, achieving state-of-the-art performance under both reference-based and reference-free metrics, with notably enhanced robustness in long-dialogue scenarios.

Technology Category

Application Category

📝 Abstract

Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns. We address this from a data selection perspective and propose \textbf{MDS} (Multi-turn Dialogue Selection), a dialogue-level framework that scores whole conversations rather than isolated turns. MDS combines a global coverage stage that performs bin-wise selection in the user-query trajectory space to retain representative yet non-redundant dialogues, with a local structural stage that evaluates within-dialogue reliability through entity-grounded topic grounding and information progress, together with query-answer form consistency for functional alignment. MDS outperforms strong single-turn selectors, dialogue-level LLM scorers, and heuristic baselines on three multi-turn benchmarks and an in-domain Banking test set, achieving the best overall rank across reference-free and reference-based metrics, and is more robust on long conversations under the same training budget. Code and resources are included in the supplementary materials.

Problem

Research questions and friction points this paper is trying to address.

multi-turn dialogue

data selection

instruction tuning

noisy data

structural inconsistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-turn Dialogue Selection

Data Selection

Instruction Tuning