Can LLMs Produce Better Object-Oriented Designs than Human-Involved Development?

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) outperform human-led development in object-oriented design (OOD). By systematically comparing three project categories—PreAI, PostAI, and PureAI—the work presents the first comprehensive evaluation of OOD quality across LLM-generated code and human-in-the-loop projects at different stages. The analysis integrates project-level OOD metrics, code smell density, and domain modeling approaches. Findings reveal that while PureAI projects exhibit simpler structures and fewer code smells, they consistently suffer from insufficient abstraction and inadequate separation of responsibilities. PostAI projects display similar tendencies toward oversimplification, underscoring that LLMs still require human guidance for complex OOD tasks. These results highlight the indispensable role of human oversight in achieving robust, well-structured object-oriented systems.

📝 Abstract

Background: Large Language Models (LLMs) are increasingly used for code generation. However, their ability to generate multi-class projects that require object-oriented design (OOD) remains unclear, especially relative to projects developed with human involvement. Aims: The primary objective of this study is to compare OOD quality in projects from three authorship conditions: PreAI (human-involved projects produced before widespread LLM use), PostAI (human-involved projects produced after widespread LLM use), and PureAI (projects generated end-to-end by contemporary LLMs). Method: We conducted a comparative case study on a postgraduate Java assignment. Two offerings of the same assignment were selected as the PreAI and PostAI datasets. PureAI projects were generated using three contemporary LLMs. We analyzed OOD quality using project-level OOD metrics, code smell density, and domain modeling. Results: Relative to human-involved projects, PureAI projects show lower code smell density and generally appear simpler in terms of total size, complexity, and coupling. However, this is consistent with oversimplification, as it is associated with missing abstractions and weaker responsibility separation. PostAI is closer to PureAI than PreAI on many OOD measures and also shows tendencies toward oversimplification. Conclusions: Our findings indicate that appropriate human guidance on object-oriented decomposition and responsibility assignment remains important when LLMs are used for object-oriented design.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Object-Oriented Design

Code Generation

Software Design Quality

Human-AI Collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Object-Oriented Design

Code Smells