Skill Drift Is Contract Violation: Proactive Maintenance for LLM Agent Skill Libraries

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the challenge of silent skill degradation in large language model (LLM) agents caused by external service or configuration changes, which conventional monitoring approaches fail to detect accurately due to coarse granularity. The study formalizes skill drift as a contract violation problem and introduces a role-semantic–guided framework for extracting and validating executable environment contracts. By parsing skill documentation and selectively verifying only those assumptions tied to functional roles, the method integrates real-time environment checks with LLM-driven error localization to produce high-fidelity maintenance signals. Empirical evaluation demonstrates zero false positives across 599 non-drifted samples, achieves 100% precision and 76% recall on known drift cases, and attains 86% precision in real-world drift detection, elevating repair success rates from 10% to 78%.

📝 Abstract

LLM agents increasingly rely on reusable skill libraries, but these skills silently decay as the external services, packages, APIs, and configurations they reference evolve. Existing monitors detect such changes at the wrong granularity: they observe values, not the role those values play in a skill. A version string in a comment is noise; the same string in a pinned dependency is an operational obligation. We formulate skill drift as contract violation and introduce \sgname{}, which extracts executable environment contracts from skill documents and validates only those role-bearing assumptions against known or live conditions. This distinction turns noisy monitoring into a precision-first maintenance signal. Contract-free CI probes produce 40\% false positives, while \sgname{} raises zero false alarms over 599 no-drift and hard-negative cases (Wilson 95\% CI $[0,0.6]\%$). In known-drift verification, \sgname{} achieves 100\% precision and 76\% recall with the strongest backbone; in a pre-registered study over 49 real skills, it discovers live drift with 86\% conservative precision. Violated contracts also make repair actionable, improving one-round success from 10\% without localization to 78\%. We release \dbname{}, an 880-pair benchmark for skill degradation.

Problem

Research questions and friction points this paper is trying to address.

skill drift

contract violation

LLM agents

skill libraries

environmental assumptions

Innovation

Methods, ideas, or system contributions that make the work stand out.

skill drift

contract violation

LLM agents