From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the high data and computational costs that hinder domain adaptation of large language models, as well as the common disconnect between data selection and parameter-efficient fine-tuning in existing approaches. The authors propose P2D, a novel framework grounded in the “strong mapping hypothesis”—the premise that sparse attention heads predominantly drive task adaptation. P2D uniquely leverages task-sensitive attention heads as dual guides to simultaneously perform high-affinity data selection and structured pruning. A lightweight proxy identifies critical heads, while a newly introduced Alignment Efficiency Ratio (AER) metric jointly optimizes sample mining and parameter updates. Remarkably, by fine-tuning only 10% of attention heads and 10% of the data, P2D outperforms strong baselines by 8.3 percentage points and achieves a 7.0× end-to-end training speedup.

📝 Abstract

Adapting Large Language Models (LLMs) to specialized domains typically incurs high data and computational overhead. While prior efficiency efforts have largely treated data selection and parameter-efficient fine-tuning as isolated processes, our empirical analysis suggests they may be intrinsically coupled. We posit the Strong Map Hypothesis: a sparse subset of attention heads plays a dominant role in task-specific adaptation, acting as keys that unlock specific data patterns. Building on this observation, we propose From Parameters to Data (P2D), a unified framework that leverages these task-sensitive attention heads as a dual compass for both sample mining and structural pruning. To rigorously quantify the total pipeline cost, we introduce the Alignment Efficiency Ratio (AER) metric for both selection latency and training time. Mechanistically, P2D identifies critical heads via a lightweight proxy and uses them as a functional filter to curate high-affinity data, establishing a synergistic pipeline. Empirically, by updating merely 10% of attention heads on 10% of the data, P2D achieves an 8.3 pp performance gain over strong baselines and delivers a 7.0x end-to-end time speedup. These results validate that precise parameter-data synchronization eliminates redundancy, offering a new paradigm for efficient alignment.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Efficient Alignment

Data Selection

Parameter-Efficient Fine-Tuning

Task Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

parameter-data synergy

attention head pruning

efficient LLM alignment