🤖 AI Summary
To address the risk of clinically erroneous synthetic data generated by large language models (LLMs) in high-stakes medical applications, this paper proposes Query-based Model Collaboration Framework (Q-MCF), an expert-guided, query-driven framework that dynamically injects domain-expert knowledge into the LLM generation process to ensure factual accuracy of critical medical information. Q-MCF employs a lightweight collaborative mechanism that embeds structured expert queries directly into the LLM’s inference pipeline, jointly optimizing data quality and clinical safety. Experiments across multiple clinical prediction tasks demonstrate that Q-MCF significantly reduces factual error rates (average reduction of 32.7%) and enhances downstream model robustness and generalization—outperforming state-of-the-art data augmentation methods. To our knowledge, this is the first work to systematically integrate structured expert querying into the LLM-based data augmentation pipeline, establishing a novel, interpretable, and verifiable paradigm for trustworthy medical AI.
📝 Abstract
Data augmentation is a widely used strategy to improve model robustness and generalization by enriching training datasets with synthetic examples. While large language models (LLMs) have demonstrated strong generative capabilities for this purpose, their applications in high-stakes domains like healthcare present unique challenges due to the risk of generating clinically incorrect or misleading information. In this work, we propose a novel query-based model collaboration framework that integrates expert-level domain knowledge to guide the augmentation process to preserve critical medical information. Experiments on clinical prediction tasks demonstrate that our lightweight collaboration-based approach consistently outperforms existing LLM augmentation methods while improving safety through reduced factual errors. This framework addresses the gap between LLM augmentation potential and the safety requirements of specialized domains.