How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code

📅 2025-03-02

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This study addresses the **algorithm-level diversity deficiency** in code generation by large language models (LLMs). We propose the first quantitative framework for algorithmic diversity, grounded in **code semantic embedding and hierarchical clustering**, and introduce novel metrics—including *algorithm cluster count* and *cross-model Jaccard similarity*—to rigorously measure diversity at the algorithmic level. Through systematic evaluation, we analyze the impact of model scale, sampling temperature (1.2–1.5), instruction fine-tuning, and problem complexity. Results show that high-temperature sampling combined with heterogeneous model ensembling increases algorithm cluster count by 2.3× while maintaining a usable solution rate above 68%. Crucially, this demonstrates—for the first time—that algorithmic diversity and functional correctness need not be mutually exclusive, thereby breaking the conventional correctness-diversity trade-off. Our work establishes both a theoretical foundation and practical methodology for modeling and controllably enhancing algorithmic diversity in LLM-based code generation.

Technology Category

Application Category

📝 Abstract

Language models (LMs) have exhibited impressive abilities in generating code from natural language requirements. In this work, we highlight the diversity of code generated by LMs as a critical criterion for evaluating their code generation capabilities. There is a lack of studies focused on assessing the diversity of generated code, which overlooks its importance in code LMs. Therefore, we propose a systematic approach to evaluate code diversity, introducing various metrics with inter-code similarity. Specifically, we introduce code clustering methods that leverages LMs' capabilities in code understanding and reasoning, resulting in a set of metrics that represent the number of algorithms in model-generated solutions. We extensively investigate the property of model-generated solutions by contrasting them with human-written ones and quantifying the impact of various factors on code diversity: model size, temperature, instruction tuning, and problem complexity. Our analysis demonstrates that model-generated solutions exhibit low algorithmic diversity, which was neglected by the research community. Moreover, we explore methods to increase code diversity by combining solutions from different models and increasing sampling temperatures. Our findings highlight that code diversity can be enhanced with the help of heterogeneous models and setting temperature beyond 1.0 that has not been fully explored due to the functional correctness degradation. To facilitate our research direction, we publicly share our code and datasets through open-source repositories.

Problem

Research questions and friction points this paper is trying to address.

Evaluating diversity of code generated by language models.

Assessing algorithmic diversity in model-generated solutions.

Exploring methods to enhance code diversity using heterogeneous models.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic approach to evaluate code diversity

Code clustering methods leveraging LMs' capabilities

Enhancing diversity with heterogeneous models and temperature

🔎 Similar Papers

Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models