CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs

📅 2025-12-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fine-tuning large language models (LLMs) on sensitive domains such as cybersecurity threat intelligence (CTI) risks exposing proprietary data to extraction attacks. Method: We propose a lightweight “privacy alignment” framework that mitigates data leakage without retraining. It introduces a novel few-shot supervised coordination mechanism integrating a privacy classifier and a red-teaming editing module into an end-to-end privacy-preserving architecture. The framework leverages privacy-aware prompt engineering driven by GPT-4o mini and Mistral-7B Instruct, coupled with a joint classification-editing design. Contribution/Results: Compared to NER-based baselines (e.g., Presidio), our approach achieves superior privacy–utility trade-offs on CTI tasks: sensitive information reconstruction rate drops by over 62%, inference overhead increases by less than 15%, and the framework demonstrates strong generalization and domain adaptability.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are often fine-tuned to adapt their general-purpose knowledge to specific tasks and domains such as cyber threat intelligence (CTI). Fine-tuning is mostly done through proprietary datasets that may contain sensitive information. Owners expect their fine-tuned model to not inadvertently leak this information to potentially adversarial end users. Using CTI as a use case, we demonstrate that data-extraction attacks can recover sensitive information from fine-tuned models on CTI reports, underscoring the need for mitigation. Retraining the full model to eliminate this leakage is computationally expensive and impractical. We propose an alternative approach, which we call privacy alignment, inspired by safety alignment in LLMs. Just like safety alignment teaches the model to abide by safety constraints through a few examples, we enforce privacy alignment through few-shot supervision, integrating a privacy classifier and a privacy redactor, both handled by the same underlying LLM. We evaluate our system, called CTIGuardian, using GPT-4o mini and Mistral-7B Instruct models, benchmarking against Presidio, a named entity recognition (NER) baseline. Results show that CTIGuardian provides a better privacy-utility trade-off than NER based models. While we demonstrate its effectiveness on a CTI use case, the framework is generic enough to be applicable to other sensitive domains.
Problem

Research questions and friction points this paper is trying to address.

Mitigates privacy leakage in fine-tuned LLMs
Protects sensitive data in cyber threat intelligence
Reduces computational cost of privacy protection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-shot privacy alignment for fine-tuned LLMs
Integrates privacy classifier and redactor using same LLM
Provides better privacy-utility trade-off than NER baselines
🔎 Similar Papers
No similar papers found.