🤖 AI Summary
This work addresses the lack of theoretical grounding and overreliance on heuristic design in supervised fine-tuning (SFT) and alignment of large language models (LLMs). We propose an end-to-end alignment framework grounded in constrained optimization, jointly modeling task performance and application-specific constraints—such as safety, factual consistency, and stylistic requirements—as a unified optimization problem with hard and soft constraints. To our knowledge, this is the first work to integrate the Lagrange multiplier method and logarithmic barrier method into LLM alignment, enabling provably constraint-satisfying, heuristic-free joint SFT and alignment. Our approach employs gradient-adaptive Lagrange multiplier updates and constraint relaxation mechanisms, ensuring high-fidelity constraint satisfaction while preserving model performance across diverse tasks. Experiments demonstrate substantial improvements in alignment controllability and cross-task generalization.
📝 Abstract
Supervised fine-tuning (SFT) and alignment of large language models (LLMs) are key steps in providing a good user experience. However, the concept of an appropriate alignment is inherently application-dependent, and current methods often rely on heuristic choices to drive optimization. In this work, we formulate SFT and alignment as a constrained optimization problem: the LLM is fine-tuned on a task while being required to meet application-specific requirements, without resorting to heuristics. To solve this, we propose Lagrange Large Language Models (L3Ms), which employ logarithmic barriers to enforce the constraints. This approach allows for the customization of L3Ms across diverse applications while avoiding heuristic-driven processes. We experimentally demonstrate the versatility and efficacy of L3Ms in achieving tailored alignments for various applications.