RACE-Align: Retrieval-Augmented and Chain-of-Thought Enhanced Preference Alignment for Large Language Models

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Domain-specific large language models (LLMs) suffer from low factual accuracy, weak reasoning capabilities, and poor interpretability—particularly in specialized fields such as Traditional Chinese Medicine (TCM). Method: This paper proposes a novel preference alignment paradigm integrating external knowledge retrieval with explicit Chain-of-Thought (CoT) reasoning. Departing from conventional RLHF or DPO approaches—which overlook knowledge provenance and reasoning logic—we jointly model “factual support” and “domain-specific CoT reasoning quality” as binary preference dimensions to construct verifiable, interpretable preference data. Our method employs AI-driven multi-stage preference generation, DPO-based optimization, retrieval augmentation, and targeted TCM knowledge injection. Contribution/Results: The framework significantly improves answer accuracy, information richness, depth of TCM reasoning application, and logical traceability. On TCM benchmark tasks, it consistently outperforms both standard baselines and supervised fine-tuning (SFT) models.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) struggle with accuracy, domain-specific reasoning, and interpretability in vertical domains. Traditional preference alignment methods like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) often overlook the underlying knowledge sources and reasoning logic. This paper introduces RACE-Align (Retrieval-Augmented and Chain-of-Thought Enhanced Alignment), a novel framework designed to address these limitations. RACE-Align systematically constructs a binary preference dataset incorporating external knowledge support and explicit Chain-of-Thought (CoT) reasoning, then aligns LLMs using the DPO algorithm. The core innovation lies in its preference data construction strategy: it integrates AI-driven retrieval for factual grounding, enhancing knowledgeability and accuracy, and emphasizes the optimization of domain-specific CoT, treating the reasoning process itself as a key preference dimension. A multi-stage, AI-driven refinement pipeline cost-effectively generates these preference pairs. Experimental validation in Traditional Chinese Medicine (TCM) using Qwen3-1.7B as the base model demonstrates that RACE-Align significantly outperforms the original base model and a model fine-tuned only with Supervised Fine-Tuning (SFT). Improvements were observed across multiple dimensions, including answer accuracy, information richness, application of TCM thinking patterns, logicality and depth of reasoning, and interpretability. These findings suggest RACE-Align offers an effective pathway to enhance LLMs' knowledge application, reasoning reliability, and process transparency in complex vertical domains.
Problem

Research questions and friction points this paper is trying to address.

Improves LLM accuracy and reasoning in specialized domains
Enhances interpretability with Chain-of-Thought preference alignment
Integrates retrieval-augmented knowledge for better factual grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-augmented preference data construction
Chain-of-Thought enhanced reasoning optimization
AI-driven multi-stage refinement pipeline
🔎 Similar Papers
No similar papers found.
Q
Qihang Yan
ShanghaiTech University
X
Xinyu Zhang
Henan University
L
Luming Guo
Henan University
Q
Qi Zhang
Liaoning University of Traditional Chinese Medicine
Feifan Liu
Feifan Liu
Associate Professor,University of Science and Technology of China
LightningNarrow bipolar eventsCoupling between troposphere and ionosphere