Measuring AI agent autonomy: Towards a scalable approach with code inspection

📅 2025-02-21

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing AI agent autonomy evaluation relies heavily on costly, high-risk runtime behavioral observation. Method: This paper proposes a static source-code analysis approach for quantifying agent autonomy—requiring neither deployment nor execution—thereby eliminating safety risks and enhancing scalability. Its core innovation is the first code-level two-dimensional autonomy taxonomy (breadth of influence / intensity of human supervision), enabling automated inference of autonomy levels by parsing orchestration logic. The method is implemented for the AutoGen framework and empirically validated across representative applications. Contribution/Results: Evaluation outcomes are reproducible, incur substantially lower costs, and introduce zero runtime risk. This work establishes a novel “static-as-evaluation” paradigm, delivering a lightweight, broadly applicable, and quantifiable tool for trustworthy AI agent governance.

Technology Category

Application Category

📝 Abstract

AI agents are AI systems that can achieve complex goals autonomously. Assessing the level of agent autonomy is crucial for understanding both their potential benefits and risks. Current assessments of autonomy often focus on specific risks and rely on run-time evaluations -- observations of agent actions during operation. We introduce a code-based assessment of autonomy that eliminates the need to run an AI agent to perform specific tasks, thereby reducing the costs and risks associated with run-time evaluations. Using this code-based framework, the orchestration code used to run an AI agent can be scored according to a taxonomy that assesses attributes of autonomy: impact and oversight. We demonstrate this approach with the AutoGen framework and select applications.

Problem

Research questions and friction points this paper is trying to address.

Assessing AI agent autonomy

Code-based autonomy evaluation

Reducing runtime evaluation risks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Code-based autonomy assessment

Reduces run-time evaluation costs

Taxonomy for autonomy attributes scoring

🔎 Similar Papers

A Survey on Large Language Model based Autonomous Agents