🤖 AI Summary
Domain-specific large language models (LLMs) suffer from low factual accuracy, weak reasoning capabilities, and poor interpretability—particularly in specialized fields such as Traditional Chinese Medicine (TCM).
Method: This paper proposes a novel preference alignment paradigm integrating external knowledge retrieval with explicit Chain-of-Thought (CoT) reasoning. Departing from conventional RLHF or DPO approaches—which overlook knowledge provenance and reasoning logic—we jointly model “factual support” and “domain-specific CoT reasoning quality” as binary preference dimensions to construct verifiable, interpretable preference data. Our method employs AI-driven multi-stage preference generation, DPO-based optimization, retrieval augmentation, and targeted TCM knowledge injection.
Contribution/Results: The framework significantly improves answer accuracy, information richness, depth of TCM reasoning application, and logical traceability. On TCM benchmark tasks, it consistently outperforms both standard baselines and supervised fine-tuning (SFT) models.
📝 Abstract
Large Language Models (LLMs) struggle with accuracy, domain-specific reasoning, and interpretability in vertical domains. Traditional preference alignment methods like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) often overlook the underlying knowledge sources and reasoning logic. This paper introduces RACE-Align (Retrieval-Augmented and Chain-of-Thought Enhanced Alignment), a novel framework designed to address these limitations. RACE-Align systematically constructs a binary preference dataset incorporating external knowledge support and explicit Chain-of-Thought (CoT) reasoning, then aligns LLMs using the DPO algorithm. The core innovation lies in its preference data construction strategy: it integrates AI-driven retrieval for factual grounding, enhancing knowledgeability and accuracy, and emphasizes the optimization of domain-specific CoT, treating the reasoning process itself as a key preference dimension. A multi-stage, AI-driven refinement pipeline cost-effectively generates these preference pairs. Experimental validation in Traditional Chinese Medicine (TCM) using Qwen3-1.7B as the base model demonstrates that RACE-Align significantly outperforms the original base model and a model fine-tuned only with Supervised Fine-Tuning (SFT). Improvements were observed across multiple dimensions, including answer accuracy, information richness, application of TCM thinking patterns, logicality and depth of reasoning, and interpretability. These findings suggest RACE-Align offers an effective pathway to enhance LLMs' knowledge application, reasoning reliability, and process transparency in complex vertical domains.