Structured Agent Distillation for Large Language Model

๐Ÿ“… 2025-05-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the high inference cost and large model size hindering practical deployment of large language model (LLM) agents, this paper proposes a Structured Agent Distillation framework. Methodologically, it explicitly segments ReAct-style reasoningโ€“action trajectories into [REASON] and [ACT] fragments for fine-grained, segment-level supervision; introduces a trajectory-structure-aware distillation paradigm that transcends conventional token-level distillation; and incorporates cross-modal alignment loss and behavioral consistency constraints to jointly optimize reasoning fidelity and action stability. Evaluated on ALFWorld, HotPotQA-ReAct, and WebShop, our approach significantly outperforms both token-level distillation and imitation learning baselines. Under substantial model compression (e.g., 7B โ†’ 1.3B), it incurs only marginal performance degradation (average โˆ’1.2%), achieving, for the first time, joint distillation of decision logic and behavioral consistency in LLM agents.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-style frameworks. Yet, their practical deployment is constrained by high inference costs and large model sizes. We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models while preserving both reasoning fidelity and action consistency. Unlike standard token-level distillation, our method segments trajectories into {[REASON]} and {[ACT]} spans, applying segment-specific losses to align each component with the teacher's behavior. This structure-aware supervision enables compact agents to better replicate the teacher's decision process. Experiments on ALFWorld, HotPotQA-ReAct, and WebShop show that our approach consistently outperforms token-level and imitation learning baselines, achieving significant compression with minimal performance drop. Scaling and ablation results further highlight the importance of span-level alignment for efficient and deployable agents.
Problem

Research questions and friction points this paper is trying to address.

Compress large LLM-based agents into smaller models
Preserve reasoning fidelity and action consistency
Align reasoning and action spans with teacher behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Segment trajectories into REASON and ACT spans
Apply segment-specific losses for alignment
Enable compact agents to replicate teacher behavior
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jun Liu
Carnegie Mellon University
Zhenglun Kong
Zhenglun Kong
Harvard University
Efficient Deep LearningLarge Language ModelAI4Science
P
Peiyan Dong
MIT
Changdi Yang
Changdi Yang
PhD candidate, Northeastern University, Snap Inc.
Efficient Deep Learning
T
Tianqi Li
Carnegie Mellon University
H
Hao Tang
Peking University
Geng Yuan
Geng Yuan
University of Georgia
Efficient AIExplainable AITrustworthy MLEdge ComputingAI Applications
W
Wei Niu
University of Georgia
W
Wenbin Zhang
Florida International University
P
Pu Zhao
Northeastern University
Xue Lin
Xue Lin
Northeastern University
electrical and computer engineering
D
Dong Huang
Carnegie Mellon University
Y
Yanzhi Wang
Northeastern University