🤖 AI Summary
Small language models (sLLMs) exhibit poor topic adherence and unreliable refusal behavior in task-oriented dialogue when exposed to off-topic or adversarial inputs. To address this, we propose Dynamic Entropy-based Scaling of Guidance Vectors (DES), the first inference-time intervention mechanism that adaptively modulates the strength of hidden-layer activation guidance based on input uncertainty—overcoming the inflexibility of static guidance for topic control. DES requires no fine-tuning and achieves robust topic consistency even under few-shot settings: it improves off-topic input refusal accuracy by 18.7% and attains a 92.4% F1 score on in-topic responses. Our core contribution lies in tightly coupling entropy estimation with activation guidance, establishing a lightweight, controllable, and robust paradigm for topic maintenance.
📝 Abstract
Small large language models (sLLMs) offer the advantage of being lightweight and efficient, which makes them suitable for resource-constrained environments. However, sLLMs often struggle to maintain topic consistency in task-oriented dialogue systems, which is critical for scenarios such as service chatbots. Specifically, it is important to ensure that the model denies off-topic or malicious inputs and adheres to its intended functionality so as to prevent potential misuse and uphold reliability. Towards this, existing activation engineering approaches have been proposed to manipulate internal activations during inference. While these methods are effective in certain scenarios, our preliminary experiments reveal their limitations in ensuring topic adherence. Therefore, to address this, we propose a novel approach termed Entropy-scaled Steering vectors for Topic Maintenance (EnSToM). EnSToM dynamically adjusts the steering intensity based on input uncertainty, which allows the model to handle off-topic distractors effectively while preserving on-topic accuracy. Our experiments demonstrate that EnSToM achieves significant performance gain with a relatively small data size compared to fine-tuning approaches. By improving topic adherence without compromising efficiency, our approach provides a robust solution for enhancing sLLM-based dialogue systems.