๐ค AI Summary
To address the high inference cost and large model size hindering practical deployment of large language model (LLM) agents, this paper proposes a Structured Agent Distillation framework. Methodologically, it explicitly segments ReAct-style reasoningโaction trajectories into [REASON] and [ACT] fragments for fine-grained, segment-level supervision; introduces a trajectory-structure-aware distillation paradigm that transcends conventional token-level distillation; and incorporates cross-modal alignment loss and behavioral consistency constraints to jointly optimize reasoning fidelity and action stability. Evaluated on ALFWorld, HotPotQA-ReAct, and WebShop, our approach significantly outperforms both token-level distillation and imitation learning baselines. Under substantial model compression (e.g., 7B โ 1.3B), it incurs only marginal performance degradation (average โ1.2%), achieving, for the first time, joint distillation of decision logic and behavioral consistency in LLM agents.
๐ Abstract
Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-style frameworks. Yet, their practical deployment is constrained by high inference costs and large model sizes. We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models while preserving both reasoning fidelity and action consistency. Unlike standard token-level distillation, our method segments trajectories into {[REASON]} and {[ACT]} spans, applying segment-specific losses to align each component with the teacher's behavior. This structure-aware supervision enables compact agents to better replicate the teacher's decision process. Experiments on ALFWorld, HotPotQA-ReAct, and WebShop show that our approach consistently outperforms token-level and imitation learning baselines, achieving significant compression with minimal performance drop. Scaling and ablation results further highlight the importance of span-level alignment for efficient and deployable agents.