🤖 AI Summary
Current AGI research lacks an operational definition, hindering quantitative assessment of the cognitive gap between AI systems and humans. To address this, we propose the first theoretically grounded, ten-dimensional AGI evaluation framework, derived from the Cattell–Horn–Carroll (CHC) theory of human cognition, covering core domains including reasoning, memory, and perception. Leveraging standardized psychometric paradigms, we conduct cross-domain, comprehensive benchmarking of leading large language models, yielding fine-grained, “sawtooth”-shaped cognitive profiles. Results reveal stark imbalances: GPT-4 scores 27% on the AGI metric, while GPT-5 achieves 58%, with pronounced deficits in long-term memory and other foundational mechanisms. This work establishes the first theory-driven, quantifiable AGI benchmark and empirically demonstrates the severe heterogeneity in current models’ cognitive capabilities—providing both a rigorous evaluation standard and actionable insights for guiding AGI development.
📝 Abstract
The lack of a concrete definition for Artificial General Intelligence (AGI) obscures the gap between today's specialized AI and human-level cognition. This paper introduces a quantifiable framework to address this, defining AGI as matching the cognitive versatility and proficiency of a well-educated adult. To operationalize this, we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition. The framework dissects general intelligence into ten core cognitive domains-including reasoning, memory, and perception-and adapts established human psychometric batteries to evaluate AI systems. Application of this framework reveals a highly "jagged" cognitive profile in contemporary models. While proficient in knowledge-intensive domains, current AI systems have critical deficits in foundational cognitive machinery, particularly long-term memory storage. The resulting AGI scores (e.g., GPT-4 at 27%, GPT-5 at 58%) concretely quantify both rapid progress and the substantial gap remaining before AGI.