🤖 AI Summary
This work addresses the challenge that large language model (LLM) agents often fall into ineffective retrieval loops when interacting with free-form Markdown-based skill libraries due to structural ambiguity. To resolve this, the authors propose Skill-as-Pseudocode, a method that automatically restructures skill libraries into type-annotated pseudocode by leveraging clustering analysis, type contract extraction, and template recovery to produce clear skill signatures and invocation examples. The approach introduces a novel fourfold determinism verification mechanism—covering coverage, binding, substitution, and risk—to rigorously ensure skill contract quality. Evaluated on 134 unseen tasks in ALFWorld, the method significantly outperforms the Graph-of-Skills baseline in success rate (82/402 vs. 47/402, p = 8.2e−5), while reducing input tokens by 22.8% and LLM calls by 14.5%.
📝 Abstract
Markdown skill libraries for LLM agents ship as free-form prose, forcing the agent to re-derive both the input schema and the concrete invocation syntax on every retrieval. We observe that this often produces a "confused -> re-retrieve -> still confused" loop in which the agent issues a partially-correct action, receives uninformative environment feedback, and re-retrieves the same prose. We propose Skill-as-Pseudocode (SaP), an automatic conversion of markdown skill libraries into typed pseudocode with deterministic quality control. For each cluster of similar procedural passages drawn from one or more skills, SaP extracts a typed contract and filters it through a four-check deterministic verifier (coverage, binding, replacement, risk). Promoted contracts are inlined into a rewritten skill skeleton together with restored concrete action templates, giving the agent two complementary signals: a typed signature for what the skill does and a concrete template for how to invoke it. On the 134-game ALFWorld unseen split with gpt-4o-mini, pooled across three seeds, SaP wins 82/402 paired games versus 47/402 for the Graph-of-Skills (GoS) baseline (pooled McNemar p = 8.2e-5), at -22.8 +/- 6.4% input tokens and -14.5 +/- 4.1% LLM calls per game.