The Alignment Paradox of Medical Large Language Models in Infertility Care: Decoupling Algorithmic Improvement from Clinical Decision-making Quality

📅 2025-11-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies an “alignment paradox” in medical large language models (LLMs) for infertility diagnosis and treatment: improved algorithmic accuracy—e.g., via Generalized Reinforcement Learning from Preference Optimization (GRPO)—does not necessarily enhance clinical decision quality. Leveraging over 8,000 real-world infertility cases, we systematically compare four alignment methods—Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), GRPO, and In-Context Learning (ICL)—within a dual-layer evaluation framework combining automated metrics and blinded clinician assessments. Results show GRPO achieves the highest technical performance, yet the SFT model attains the highest clinician win rate (51.2%), outperforming original physician decisions by +22.7%. This work provides the first empirical evidence that clinical interpretability and therapeutic feasibility are more critical than raw predictive accuracy. It challenges prevailing alignment paradigms and proposes a clinically grounded evaluation standard centered on real-world medical value.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are increasingly adopted in clinical decision support, yet aligning them with the multifaceted reasoning pathways of real-world medicine remains a major challenge. Using more than 8,000 infertility treatment records, we systematically evaluate four alignment strategies: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), Group Relative Policy Optimization (GRPO), and In-Context Learning (ICL) through a dual-layer framework combining automatic benchmarks with blinded doctor-in-the-loop assessments. GRPO achieves the highest algorithmic accuracy across multiple decision layers, confirming the value of reinforcement-based optimization for structured prediction tasks. However, clinicians consistently prefer the SFT model, citing clearer reasoning processes (p = 0.035) and higher therapeutic feasibility (p = 0.019). In blinded pairwise comparisons, SFT attains the highest winning rate (51.2%), outperforming both GRPO (26.2%) and even physicians' original decisions (22.7%). These results reveal an alignment paradox: algorithmic improvements do not necessarily translate into higher clinical trust, and may diverge from human-centered preferences. Our findings highlight the need for alignment strategies that prioritize clinically interpretable and practically feasible reasoning, rather than solely optimizing decision-level accuracy.
Problem

Research questions and friction points this paper is trying to address.

Evaluating alignment strategies for medical LLMs in infertility care decision-making
Addressing the paradox between algorithmic accuracy and clinical trust
Developing alignment methods that prioritize interpretable clinical reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Group Relative Policy Optimization enhances algorithmic accuracy
Supervised Fine-Tuning improves clinical interpretability and feasibility
Dual-layer framework combines automatic benchmarks with clinician assessments
🔎 Similar Papers
No similar papers found.
D
Dou Liu
Department of Obstetrics and Gynecology, West China Second University Hospital, China.
Y
Ying Long
Department of Obstetrics and Gynecology, West China Second University Hospital, China.
S
Sophia Zuoqiu
Department of Industrial Engineering, Sichuan University, China.
K
Kaipeng Xie
Department of Industrial Engineering, Sichuan University, China.
Runze Yang
Runze Yang
Ph.D. at Shanghai Jiao Tong University
Deep learningTime series analysisMedical Signal
D
Di Liu
West China Biomedical Big Data Center, West China Hospital, China.
K
Kang Li
West China Biomedical Big Data Center, West China Hospital, China.
Y
Yiting Lin
West China School of Medicine, Sichuan University, China.
H
Hanyi Liu
West China School of Medicine, Sichuan University, China.
Rong Yin
Rong Yin
Associate Researcher, Institute of Information Engineering, Chinese Academy of Sciences
LLMGraph Representation LearningStatistical Learning Theory
Tian Tang
Tian Tang
university of alberta