🤖 AI Summary
To address the limitations of traditional open-source software (OSS) usability assessment—namely, its heavy reliance on manual effort, high cost, and poor scalability—this paper proposes the first agent-driven automated evaluation framework. The framework leverages large language models (LLMs) to instantiate multi-role developer agents that simulate realistic programming tasks; it dynamically constructs platform-specific knowledge bases and generates context-aware code to quantitatively assess compliance, correctness, and readability. Key contributions include: (1) the first configurable, interactive agent-based evaluation paradigm; (2) support for experience-level simulation (e.g., junior vs. senior developers); and (3) platform-adaptive knowledge acquisition and multidimensional automated evaluation. Empirical evaluation on a graph analytics platform demonstrates significant improvements in assessment efficiency and coverage, enabling low-cost, highly scalable, and high-fidelity quantitative OSS usability evaluation.
📝 Abstract
Usability evaluation is critical to the impact and adoption of open source software (OSS), yet traditional methods relying on human evaluators suffer from high costs and limited scalability. To address these limitations, we introduce OSS-UAgent, an automated, configurable, and interactive agent-based usability evaluation framework specifically designed for open source software. Our framework employs intelligent agents powered by large language models (LLMs) to simulate developers performing programming tasks across various experience levels (from Junior to Expert). By dynamically constructing platform-specific knowledge bases, OSS-UAgent ensures accurate and context-aware code generation. The generated code is automatically evaluated across multiple dimensions, including compliance, correctness, and readability, providing a comprehensive measure of the software's usability. Additionally, our demonstration showcases OSS-UAgent's practical application in evaluating graph analytics platforms, highlighting its effectiveness in automating usability evaluation.