🤖 AI Summary
This paper identifies a novel failure mode—“format inertia”—in large language models (LLMs) deployed for multi-turn medical pre-consultation dialogues: due to severe skew in turn distribution within supervised fine-tuning (SFT) data, models over-prioritize syntactically well-formed yet semantically redundant and diagnostically uninformative questions, with degradation worsening in longer dialogues. To address this, we propose a turn-distribution rebalancing strategy for data reconstruction, systematically mitigating format inertia. Experiments demonstrate that training on the reconstructed dataset significantly reduces the rate of ineffective queries, improves information density and clinical relevance, and enhances diagnostic support capability—all while preserving formatting compliance. This work is the first to formally name, empirically characterize, and quantitatively analyze format inertia in medical dialogue systems. It provides both a novel conceptual framework for understanding LLM brittleness in clinical contexts and a reproducible, data-centric methodology for robustness optimization.
📝 Abstract
Recent advances in Large Language Models (LLMs) have brought significant improvements to various service domains, including chatbots and medical pre-consultation applications. In the healthcare domain, the most common approach for adapting LLMs to multi-turn dialogue generation is Supervised Fine-Tuning (SFT). However, datasets for SFT in tasks like medical pre-consultation typically exhibit a skewed turn-count distribution. Training on such data induces a novel failure mechanism we term **Format Inertia**, where models tend to generate repetitive, format-correct, but diagnostically uninformative questions in long medical dialogues. To mitigate this observed failure mechanism, we adopt a simple, data-centric method that rebalances the turn-count distribution of the training dataset. Experimental results show that our approach substantially alleviates Format Inertia in medical pre-consultation.