🤖 AI Summary
Deploying large language models (LLMs) for cybersecurity question-answering on resource-constrained edge devices faces three intertwined challenges: high computational overhead, accuracy degradation under quantization, and weakened adversarial robustness.
Method: We propose AQUA-LLM, the first framework to systematically investigate the synergistic effects of quantization and task-specific fine-tuning. We evaluate four configurations—baseline, quantization-only, fine-tuning-only, and quantization-plus-fine-tuning—on cybersecurity QA tasks, jointly measuring accuracy, inference efficiency, and adversarial robustness.
Results: Quantization alone improves efficiency but substantially harms both accuracy and robustness; in contrast, lightweight fine-tuning combined with quantization not only recovers but often exceeds the original model’s accuracy while significantly enhancing resistance to adversarial attacks. This synergy achieves an optimal trade-off among all three objectives. Our work establishes a reproducible optimization paradigm and empirical benchmark for deploying secure, efficient LLMs on edge devices.
📝 Abstract
Large Language Models (LLMs) have recently demonstrated strong potential for cybersecurity question answering (QA), supporting decision-making in real-time threat detection and response workflows. However, their substantial computational demands pose significant challenges for deployment on resource-constrained edge devices. Quantization, a widely adopted model compression technique, can alleviate these constraints. Nevertheless, quantization may degrade model accuracy and increase susceptibility to adversarial attacks. Fine-tuning offers a potential means to mitigate these limitations, but its effectiveness when combined with quantization remains insufficiently explored. Hence, it is essential to understand the trade-offs among accuracy, efficiency, and robustness. We propose AQUA-LLM, an evaluation framework designed to benchmark several state-of-the-art small LLMs under four distinct configurations: base, quantized-only, fine-tuned, and fine-tuned combined with quantization, specifically for cybersecurity QA. Our results demonstrate that quantization alone yields the lowest accuracy and robustness despite improving efficiency. In contrast, combining quantization with fine-tuning enhances both LLM robustness and predictive performance, achieving an optimal balance of accuracy, robustness, and efficiency. These findings highlight the critical need for quantization-aware, robustness-preserving fine-tuning methodologies to enable the robust and efficient deployment of LLMs for cybersecurity QA.