Controllable Reasoning Models Are Private Thinkers

πŸ“… 2026-02-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of controlling reasoning processes in large language models when handling sensitive data, a vulnerability that can lead to privacy leakage. We propose a novel approach that extends instruction-following capabilities to the level of reasoning trajectories by decoupling the reasoning process from answer generation and introducing dedicated LoRA adapters for controllable inference. To support this framework, we construct a new instruction-tuning dataset incorporating explicit privacy constraints and evaluate our method across six mainstream models. Experimental results demonstrate substantial improvements: up to a 20.9-point gain in instruction-following performance and a 51.9-percentage-point increase on privacy-protection benchmarks, significantly enhancing the model’s privacy safety without compromising utility.

Technology Category

Application Category

πŸ“ Abstract
AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the unintended leakage of private information to external parties. We propose training models to follow instructions not only in the final answer, but also in reasoning traces, potentially under different constraints. We hypothesize that improving their instruction following abilities in the reasoning traces can improve their privacy-preservation skills. To demonstrate this, we fine-tune models on a new instruction-following dataset with explicit restrictions on reasoning traces. We further introduce a generation strategy that decouples reasoning and answer generation using separate LoRA adapters. We evaluate our approach on six models from two model families, ranging from 1.7B to 14B parameters, across two instruction-following benchmarks and two privacy benchmarks. Our method yields substantial improvements, achieving gains of up to 20.9 points in instruction-following performance and up to 51.9 percentage points on privacy benchmarks. These improvements, however, can come at the cost of task utility, due to the trade-off between reasoning performance and instruction-following abilities. Overall, our results show that improving instruction-following behavior in reasoning models can significantly enhance privacy, suggesting a promising direction for the development of future privacy-aware agents. Our code and data are available at https://github.com/UKPLab/arxiv2026-controllable-reasoning-models
Problem

Research questions and friction points this paper is trying to address.

reasoning models
privacy leakage
instruction following
private information
controllable reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

controllable reasoning
privacy-preserving AI
instruction-following
reasoning traces
LoRA adapters
πŸ”Ž Similar Papers
No similar papers found.
H
Haritz Puerto
Ubiquitous Knowledge Processing Lab (UKP Lab), Department of Computer Science, Technical University of Darmstadt and National Research Center for Applied Cybersecurity ATHENE, Germany
H
Haonan Li
Mohamed bin Zayed University of Artificial Intelligence, UAE; LibrAI
Xudong Han
Xudong Han
LibrAI & MBZUAI
NLP
Timothy Baldwin
Timothy Baldwin
MBZUAI and The University of Melbourne
computational linguisticsnatural language processingartificial intelligence
Iryna Gurevych
Iryna Gurevych
Full Professor, TU Darmstadt; Adjunct Professor, MBZUAI, UAE; Affiliated Professor, INSAIT, Bulgaria
Natural Language ProcessingLarge Language ModelsArtificial Intelligence