🤖 AI Summary
Current instruction-tuning methods rely heavily on large-scale, high-quality instruction-response pairs but commonly neglect deep alignment between instructions and responses, thereby limiting data quality. To address this, we propose MAIN, a Bidirectional Mutual Alignment Framework, which for the first time formulates alignment as a bidirectional, mutually constraining optimization problem across semantic, intentional, and structural dimensions—departing from conventional unidirectional quality assessment paradigms. MAIN integrates contrastive learning, mutual information maximization, and consistency regularization to construct a differentiable alignment scoring module, seamlessly embedded into LLaMA/Mistral fine-tuning pipelines. On AlpacaEval and MT-Bench, MAIN improves LLaMA-3-8B and Mistral-7B by 4.2 and 3.8 points, respectively, demonstrating substantial gains in generalization and robustness. Empirical results confirm that bidirectional alignment fidelity is more decisive than unidirectional response quality alone.
📝 Abstract
Instruction tuning has enabled large language models (LLMs) to achieve remarkable performance, but its success heavily depends on the availability of large-scale, high-quality instruction-response pairs. However, current methods for scaling up data generation often overlook a crucial aspect: the alignment between instructions and responses. We hypothesize that high-quality instruction-response pairs are not defined by the individual quality of each component, but by the extent of their alignment with each other. To address this, we propose a Mutual Alignment Framework (MAIN) that ensures coherence between the instruction and response through mutual constraints. Experiments demonstrate that models such as LLaMA and Mistral, fine-tuned within this framework, outperform traditional methods across multiple benchmarks. This approach underscores the critical role of instruction-response alignment in enabling scalable and high-quality instruction tuning for LLMs.