TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge

📅 2025-10-26

🏛️ 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the high computational overhead, degradation of linguistic properties after compression, and lack of behavioral verifiability when deploying large language models (LLMs) on edge devices, this paper introduces Signal Temporal Logic (STL) into training-free LLM compression for the first time. We propose a Bayesian optimization framework constrained by STL robustness specifications, jointly optimizing layer-adaptive quantization and structured pruning configurations. Crucially, our approach guarantees—without any fine-tuning—the verifiable preservation of critical linguistic properties, including consistency and temporal semantics. Experiments on GPT-2, DeepSeek-V2 7B, LLaMA-3 8B, and Mistral 7B demonstrate up to 3.3× reduction in FLOPs and 68.8% model size compression, while strictly satisfying pre-specified STL linguistic constraints. Our method significantly outperforms existing compression techniques in both efficiency and formal behavioral assurance.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) deliver exceptional performance across natural language tasks but demand substantial computational resources, limiting their deployment on resource-constrained edge devices. Existing compression techniques, such as quantization and pruning, often degrade critical linguistic properties and lack formal guarantees for preserving model behavior. We propose TOGGLE (Temporal Logic-Guided Large Language Model Compression), a novel framework that leverages Signal Temporal Logic (STL) to formally specify and enforce linguistic properties during compression. TOGGLE employs an STL robustness-guided Bayesian optimization to systematically explore layer-wise quantization and pruning configurations, generating compressed models that formally satisfy specified linguistic constraints without re-training or fine-tuning. Evaluating TOGGLE on four LLM architectures (GPT-2, DeepSeek-V2 7B, LLaMA 3 8B, and Mistral 7B), we achieve up to 3.3× reduction in computational costs (FLOPs) and up to a 68.8% reduction in model size while satisfying all linguistic properties. TOGGLE represents the first integration of formal methods into LLM compression, enabling efficient, verifiable deployment of LLMs on edge hardware.

Problem

Research questions and friction points this paper is trying to address.

Compress large language models for edge devices

Preserve linguistic properties during model compression

Formally guarantee compressed model behavior without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Signal Temporal Logic to enforce linguistic properties

Employs Bayesian optimization for layer-wise compression configurations

Achieves compression without retraining or fine-tuning models

🔎 Similar Papers

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches