Controllable Reasoning Models Are Private Thinkers

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenge of controlling reasoning processes in large language models when handling sensitive data, a vulnerability that can lead to privacy leakage. We propose a novel approach that extends instruction-following capabilities to the level of reasoning trajectories by decoupling the reasoning process from answer generation and introducing dedicated LoRA adapters for controllable inference. To support this framework, we construct a new instruction-tuning dataset incorporating explicit privacy constraints and evaluate our method across six mainstream models. Experimental results demonstrate substantial improvements: up to a 20.9-point gain in instruction-following performance and a 51.9-percentage-point increase on privacy-protection benchmarks, significantly enhancing the model’s privacy safety without compromising utility.

Technology Category

Application Category

📝 Abstract

AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the unintended leakage of private information to external parties. We propose training models to follow instructions not only in the final answer, but also in reasoning traces, potentially under different constraints. We hypothesize that improving their instruction following abilities in the reasoning traces can improve their privacy-preservation skills. To demonstrate this, we fine-tune models on a new instruction-following dataset with explicit restrictions on reasoning traces. We further introduce a generation strategy that decouples reasoning and answer generation using separate LoRA adapters. We evaluate our approach on six models from two model families, ranging from 1.7B to 14B parameters, across two instruction-following benchmarks and two privacy benchmarks. Our method yields substantial improvements, achieving gains of up to 20.9 points in instruction-following performance and up to 51.9 percentage points on privacy benchmarks. These improvements, however, can come at the cost of task utility, due to the trade-off between reasoning performance and instruction-following abilities. Overall, our results show that improving instruction-following behavior in reasoning models can significantly enhance privacy, suggesting a promising direction for the development of future privacy-aware agents. Our code and data are available at https://github.com/UKPLab/arxiv2026-controllable-reasoning-models

Problem

Research questions and friction points this paper is trying to address.

reasoning models

privacy leakage

instruction following

private information

controllable reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

controllable reasoning

privacy-preserving AI

instruction-following