ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This work addresses the limitation of existing skill distillation methods for large language model (LLM) agents, which lack fine-grained awareness of operational costs and struggle to differentiate effective, redundant, or erroneous steps. The authors propose ClawTrace, a tracing platform that logs every LLM invocation, tool usage, and sub-agent generation, producing YAML-formatted TraceCards annotated with cost, token consumption, and redundancy markers. Building on this, they introduce CostCraft, a distillation pipeline that generates three types of skill patches: retain, prune, and repair. For the first time, fine-grained cost signals guide distillation, complemented by a counterfactual reasoning–based pruning mechanism. The study reveals an asymmetry in skill rule transfer: pruning rules generalize well to reduce costs (achieving a 32% median cost reduction on SkillsBench), whereas retention rules often degrade performance on new tasks. Experiments demonstrate that cost attribution and pruning patches effectively mitigate quality loss.

Technology Category

Application Category

📝 Abstract

Skill-distillation pipelines learn reusable rules from LLM agent trajectories, but they lack a key signal: how much each step costs. Without per-step cost, a pipeline cannot distinguish adding a missing step to fix a bug from removing an expensive step that never affected the outcome. We introduce ClawTrace, an agent tracing platform that records every LLM call, tool use, and sub-agent spawn during an agent session and compiles each session into a TraceCard: a compact YAML summary with per-step USD cost, token counts, and redundancy flags. Built on ClawTrace, CostCraft is a distillation pipeline that reads TraceCards and produces three types of skill patches. Preserve patches keep behaviors that led to success. Prune patches remove expensive steps that did not matter, each backed by a counterfactual argument against a named high-cost step. Repair patches fix failures grounded in oracle evidence. Ablations on 30 held-out SpreadsheetBench tasks show that both cost attribution and prune patches independently reduce quality regressions. When the same skill is applied to 30 unrelated SkillsBench tasks, an unexpected asymmetry emerges: prune rules transferred across benchmarks and cut median cost by 32%, while preserve rules, trained on benchmark-specific conventions, caused regressions on new task types. We release ClawTrace and TraceCards as open infrastructure for cost-aware agent research.

Problem

Research questions and friction points this paper is trying to address.

skill distillation

cost-aware tracing

LLM agent

redundancy

quality regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

cost-aware tracing

skill distillation

LLM agent