EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fine-tuning large language models (LLMs) incurs memory overhead several times greater than inference, hindering deployment on consumer-grade hardware. To address this, we propose an efficient fine-tuning framework based on task-aware lightweight simulators. Our method constructs a downstream-task-specific simulator via activation-aware singular value decomposition (SVD)—the first such approach. It jointly employs LoRA-based fine-tuning and a novel compensation correction algorithm to losslessly transfer fine-tuned parameters from the simulator back to the original model. Crucially, the entire process uses native FP16 training without quantization. Evaluated on a single 24GB GPU, our framework successfully fine-tunes a 38B-parameter model, outperforming mainstream baselines across diverse multimodal tasks and datasets. It achieves ultra-high compression—simulator parameters constitute less than 0.1% of the full model—while remaining fully compatible with standard training pipelines.

Technology Category

Application Category

📝 Abstract
Open-source foundation models have seen rapid adoption and development, enabling powerful general-purpose capabilities across diverse domains. However, fine-tuning large foundation models for domain-specific or personalized tasks remains prohibitively expensive for most users due to the significant memory overhead beyond that of inference. We introduce EMLoC, an Emulator-based Memory-efficient fine-tuning framework with LoRA Correction, which enables model fine-tuning within the same memory budget required for inference. EMLoC constructs a task-specific light-weight emulator using activation-aware singular value decomposition (SVD) on a small downstream calibration set. Fine-tuning then is performed on this lightweight emulator via LoRA. To tackle the misalignment between the original model and the compressed emulator, we propose a novel compensation algorithm to correct the fine-tuned LoRA module, which thus can be merged into the original model for inference. EMLoC supports flexible compression ratios and standard training pipelines, making it adaptable to a wide range of applications. Extensive experiments demonstrate that EMLoC outperforms other baselines across multiple datasets and modalities. Moreover, without quantization, EMLoC enables fine-tuning of a 38B model on a single 24GB consumer GPU-bringing efficient and practical model adaptation to individual users.
Problem

Research questions and friction points this paper is trying to address.

Fine-tuning large models is expensive due to memory overhead.
EMLoC enables tuning within the memory budget of inference.
A compensation algorithm aligns compressed emulator with the original model.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Emulator-based memory-efficient fine-tuning framework.
Activation-aware SVD creates lightweight task-specific emulator.
Novel LoRA correction algorithm for misalignment compensation.
🔎 Similar Papers
No similar papers found.