🤖 AI Summary
Full-parameter fine-tuning of large language models (LLMs) for chain-of-thought (CoT) reasoning is computationally prohibitive, while existing parameter-efficient fine-tuning (PEFT) methods lack task-aware co-optimization of parameters and data. Method: We propose LoRA-PAR, a LoRA-based partitioned fine-tuning framework grounded in the dual-system “fast-and-slow thinking” theory. It classifies reasoning data into intuitive vs. logical types via multi-model role-playing and voting, dynamically partitions LoRA parameters based on importance scoring, and employs a two-stage training strategy combining supervised fine-tuning and reinforcement learning. Contribution/Results: Experiments demonstrate that LoRA-PAR achieves performance on par with or surpassing state-of-the-art PEFT methods while significantly reducing activated parameters. It is the first approach to enable joint, inference-mode–aware adaptation of data and parameters, establishing a new paradigm for efficient and interpretable LLM fine-tuning.
📝 Abstract
Large-scale generative models like DeepSeek-R1 and OpenAI-O1 benefit substantially from chain-of-thought (CoT) reasoning, yet pushing their performance typically requires vast data, large model sizes, and full-parameter fine-tuning. While parameter-efficient fine-tuning (PEFT) helps reduce cost, most existing approaches primarily address domain adaptation or layer-wise allocation rather than explicitly tailoring data and parameters to different response demands. Inspired by "Thinking, Fast and Slow," which characterizes two distinct modes of thought-System 1 (fast, intuitive, often automatic) and System 2 (slower, more deliberative and analytic)-we draw an analogy that different "subregions" of an LLM's parameters might similarly specialize for tasks that demand quick, intuitive responses versus those requiring multi-step logical reasoning. Therefore, we propose LoRA-PAR, a dual-system LoRA framework that partitions both data and parameters by System 1 or System 2 demands, using fewer yet more focused parameters for each task. Specifically, we classify task data via multi-model role-playing and voting, and partition parameters based on importance scoring, then adopt a two-stage fine-tuning strategy of training System 1 tasks with supervised fine-tuning (SFT) to enhance knowledge and intuition and refine System 2 tasks with reinforcement learning (RL) to reinforce deeper logical deliberation next. Extensive experiments show that the two-stage fine-tuning strategy, SFT and RL, lowers active parameter usage while matching or surpassing SOTA PEFT baselines.