Does GPT Really Get It? A Hierarchical Scale to Quantify Human vs AI's Understanding of Algorithms

📅 2024-06-20

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This study systematically investigates cognitive disparities between humans and large language models (LLMs)—exemplified by GPT-3.5 and GPT-4—in algorithmic understanding. Method: We propose the first formal, quantifiable five-level framework for algorithmic understanding, integrating philosophical, psychological, and pedagogical perspectives. Using cognitive modeling, human-annotated evaluation protocols, and human–AI double-blind experiments, we compare undergraduate and graduate students with multiple GPT versions. Contribution/Results: Results show that GPT achieves near-human performance in syntactic parsing and stepwise execution but exhibits significant deficits in higher-order competencies—particularly abstract transfer and causal explanation—manifesting a “superficially correct, deeply deficient” pattern. The hierarchical scale has emerged as a community-recognized benchmark for assessing algorithmic understanding capabilities.

Technology Category

Application Category

📝 Abstract

As Large Language Models (LLMs) perform (and sometimes excel at) more and more complex cognitive tasks, a natural question is whether AI really understands. The study of understanding in LLMs is in its infancy, and the community has yet to incorporate well-trodden research in philosophy, psychology, and education. We initiate this, specifically focusing on understanding algorithms, and propose a hierarchy of levels of understanding. We use the hierarchy to design and conduct a study with human subjects (undergraduate and graduate students) as well as large language models (generations of GPT), revealing interesting similarities and differences. We expect that our rigorous criteria will be useful to keep track of AI's progress in such cognitive domains.

Problem

Research questions and friction points this paper is trying to address.

Algorithm Understanding

Human-AI Comparison

GPT Evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Framework

Algorithm Understanding

GPT Evaluation

🔎 Similar Papers

No similar papers found.