Structured Agent Distillation for Large Language Model

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address the high inference cost and large model size hindering practical deployment of large language model (LLM) agents, this paper proposes a Structured Agent Distillation framework. Methodologically, it explicitly segments ReAct-style reasoning–action trajectories into [REASON] and [ACT] fragments for fine-grained, segment-level supervision; introduces a trajectory-structure-aware distillation paradigm that transcends conventional token-level distillation; and incorporates cross-modal alignment loss and behavioral consistency constraints to jointly optimize reasoning fidelity and action stability. Evaluated on ALFWorld, HotPotQA-ReAct, and WebShop, our approach significantly outperforms both token-level distillation and imitation learning baselines. Under substantial model compression (e.g., 7B → 1.3B), it incurs only marginal performance degradation (average −1.2%), achieving, for the first time, joint distillation of decision logic and behavioral consistency in LLM agents.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-style frameworks. Yet, their practical deployment is constrained by high inference costs and large model sizes. We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models while preserving both reasoning fidelity and action consistency. Unlike standard token-level distillation, our method segments trajectories into {[REASON]} and {[ACT]} spans, applying segment-specific losses to align each component with the teacher's behavior. This structure-aware supervision enables compact agents to better replicate the teacher's decision process. Experiments on ALFWorld, HotPotQA-ReAct, and WebShop show that our approach consistently outperforms token-level and imitation learning baselines, achieving significant compression with minimal performance drop. Scaling and ablation results further highlight the importance of span-level alignment for efficient and deployable agents.

Problem

Research questions and friction points this paper is trying to address.

Compress large LLM-based agents into smaller models

Preserve reasoning fidelity and action consistency

Align reasoning and action spans with teacher behavior

Innovation

Methods, ideas, or system contributions that make the work stand out.

Segment trajectories into REASON and ACT spans

Apply segment-specific losses for alignment

Enable compact agents to replicate teacher behavior

🔎 Similar Papers

No similar papers found.