🤖 AI Summary
To address the high computational overhead, degradation of linguistic properties after compression, and lack of behavioral verifiability when deploying large language models (LLMs) on edge devices, this paper introduces Signal Temporal Logic (STL) into training-free LLM compression for the first time. We propose a Bayesian optimization framework constrained by STL robustness specifications, jointly optimizing layer-adaptive quantization and structured pruning configurations. Crucially, our approach guarantees—without any fine-tuning—the verifiable preservation of critical linguistic properties, including consistency and temporal semantics. Experiments on GPT-2, DeepSeek-V2 7B, LLaMA-3 8B, and Mistral 7B demonstrate up to 3.3× reduction in FLOPs and 68.8% model size compression, while strictly satisfying pre-specified STL linguistic constraints. Our method significantly outperforms existing compression techniques in both efficiency and formal behavioral assurance.
📝 Abstract
Large Language Models (LLMs) deliver exceptional performance across natural language tasks but demand substantial computational resources, limiting their deployment on resource-constrained edge devices. Existing compression techniques, such as quantization and pruning, often degrade critical linguistic properties and lack formal guarantees for preserving model behavior. We propose TOGGLE (Temporal Logic-Guided Large Language Model Compression), a novel framework that leverages Signal Temporal Logic (STL) to formally specify and enforce linguistic properties during compression. TOGGLE employs an STL robustness-guided Bayesian optimization to systematically explore layer-wise quantization and pruning configurations, generating compressed models that formally satisfy specified linguistic constraints without re-training or fine-tuning. Evaluating TOGGLE on four LLM architectures (GPT-2, DeepSeek-V2 7B, LLaMA 3 8B, and Mistral 7B), we achieve up to 3.3× reduction in computational costs (FLOPs) and up to a 68.8% reduction in model size while satisfying all linguistic properties. TOGGLE represents the first integration of formal methods into LLM compression, enabling efficient, verifiable deployment of LLMs on edge hardware.