Efficient Compositional Multi-tasking for On-device Large Language Models

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle to efficiently support concurrent text-based multitask inference on resource-constrained edge devices. Method: This work pioneers systematic on-device compositional multitask learning, proposing a lightweight learnable calibration framework that enables joint modeling and parameter sharing across tasks (e.g., translation, summarization) via adapter parameter fusion and task-cooperative inference—avoiding full-model fine-tuning. Contribution/Results: The approach significantly reduces GPU memory usage (by 42%) and inference latency (by 36%) while incurring only a marginal average performance drop of 1.2% compared to single-task baselines. We introduce a benchmark comprising four practical compositional multitask configurations. Experimental results demonstrate strong efficiency–accuracy trade-off balance and robust generalization across diverse task combinations, validating the framework’s viability for real-world edge deployment.

Technology Category

Application Category

📝 Abstract
Adapter parameters provide a mechanism to modify the behavior of machine learning models and have gained significant popularity in the context of large language models (LLMs) and generative AI. These parameters can be merged to support multiple tasks via a process known as task merging. However, prior work on merging in LLMs, particularly in natural language processing, has been limited to scenarios where each test example addresses only a single task. In this paper, we focus on on-device settings and study the problem of text-based compositional multi-tasking, where each test example involves the simultaneous execution of multiple tasks. For instance, generating a translated summary of a long text requires solving both translation and summarization tasks concurrently. To facilitate research in this setting, we propose a benchmark comprising four practically relevant compositional tasks. We also present an efficient method (Learnable Calibration) tailored for on-device applications, where computational resources are limited, emphasizing the need for solutions that are both resource-efficient and high-performing. Our contributions lay the groundwork for advancing the capabilities of LLMs in real-world multi-tasking scenarios, expanding their applicability to complex, resource-constrained use cases.
Problem

Research questions and friction points this paper is trying to address.

Enabling on-device LLMs to handle multiple tasks simultaneously
Developing resource-efficient methods for compositional multi-tasking
Creating benchmarks for real-world multi-tasking scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapter parameters enable multi-task behavior modification
Learnable Calibration for efficient on-device multi-tasking
Benchmark for compositional tasks in resource-limited settings
🔎 Similar Papers
No similar papers found.
Ondrej Bohdal
Ondrej Bohdal
Samsung Research
Machine LearningDeep LearningComputer VisionNatural Language Processing
M
Mete Ozay
Samsung R&D Institute UK, United Kingdom
J
Jijoong Moon
Samsung Research, South Korea
K
Kyeng-Hun Lee
Samsung Research, South Korea
Hyeonmok Ko
Hyeonmok Ko
Principle Engineer, SAMSUNG ELECTRONICS CO. LTD.
Large Language ModelAINatural Language UnderstandingWireless CommunicationsNetwork Protocol
U
Umberto Michieli
Samsung R&D Institute UK, United Kingdom