Efficient Compositional Multi-tasking for On-device Large Language Models

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Large language models (LLMs) struggle to efficiently support concurrent text-based multitask inference on resource-constrained edge devices. Method: This work pioneers systematic on-device compositional multitask learning, proposing a lightweight learnable calibration framework that enables joint modeling and parameter sharing across tasks (e.g., translation, summarization) via adapter parameter fusion and task-cooperative inference—avoiding full-model fine-tuning. Contribution/Results: The approach significantly reduces GPU memory usage (by 42%) and inference latency (by 36%) while incurring only a marginal average performance drop of 1.2% compared to single-task baselines. We introduce a benchmark comprising four practical compositional multitask configurations. Experimental results demonstrate strong efficiency–accuracy trade-off balance and robust generalization across diverse task combinations, validating the framework’s viability for real-world edge deployment.

Technology Category

Application Category

📝 Abstract

Adapter parameters provide a mechanism to modify the behavior of machine learning models and have gained significant popularity in the context of large language models (LLMs) and generative AI. These parameters can be merged to support multiple tasks via a process known as task merging. However, prior work on merging in LLMs, particularly in natural language processing, has been limited to scenarios where each test example addresses only a single task. In this paper, we focus on on-device settings and study the problem of text-based compositional multi-tasking, where each test example involves the simultaneous execution of multiple tasks. For instance, generating a translated summary of a long text requires solving both translation and summarization tasks concurrently. To facilitate research in this setting, we propose a benchmark comprising four practically relevant compositional tasks. We also present an efficient method (Learnable Calibration) tailored for on-device applications, where computational resources are limited, emphasizing the need for solutions that are both resource-efficient and high-performing. Our contributions lay the groundwork for advancing the capabilities of LLMs in real-world multi-tasking scenarios, expanding their applicability to complex, resource-constrained use cases.

Problem

Research questions and friction points this paper is trying to address.

Enabling on-device LLMs to handle multiple tasks simultaneously

Developing resource-efficient methods for compositional multi-tasking

Creating benchmarks for real-world multi-tasking scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapter parameters enable multi-task behavior modification

Learnable Calibration for efficient on-device multi-tasking

Benchmark for compositional tasks in resource-limited settings

🔎 Similar Papers

No similar papers found.