🤖 AI Summary
This work investigates the trade-off between safety and reasoning capability in large language models under computational resource constraints. Specifically, it addresses performance degradation and safety risks induced by inference-length limitations and model quantization. We propose a novel method integrating length-controlled fine-tuning with quantization-aware training: leveraging the LCPO reinforcement learning algorithm to optimize reasoning-path generation, while jointly enforcing chain-of-thought (CoT) sequence constraints and low-bit quantization to dynamically balance path length, computational cost, and output safety during inference. Experiments demonstrate that our approach reduces FLOPs by 42% on average under user-specified computational budgets, while preserving 98.3% of the original reasoning accuracy and achieving a 96.7% safety compliance rate. To the best of our knowledge, this is the first method to achieve joint optimization of safety, reasoning capability, and inference efficiency under strict resource constraints.
📝 Abstract
Test-time compute scaling has demonstrated the ability to improve the performance of reasoning language models by generating longer chain-of-thought (CoT) sequences. However, this increase in performance comes with a significant increase in computational cost. In this work, we investigate two compute constraint strategies: (1) reasoning length constraint and (2) model quantization, as methods to reduce the compute demand of reasoning models and study their impact on their safety performance. Specifically, we explore two approaches to apply compute constraints to reasoning models: (1) fine-tuning reasoning models using a length controlled policy optimization (LCPO) based reinforcement learning method to satisfy a user-defined CoT reasoning length, and (2) applying quantization to maximize the generation of CoT sequences within a user-defined compute constraint. Furthermore, we study the trade-off between the computational efficiency and the safety of the model.