š¤ AI Summary
Quantifying intelligence in AI systems and identifying fundamental bottlenecks to autonomy remain unresolved challenges. Method: We propose an āIntelligence Testingā framework that formalizes intelligence as *survival capacity*āin the sense of natural selectionāand introduces a computable metric: the probability distribution (expectation and variance) of task failure counts. Lower failure rates and reduced variance indicate higher intelligence; finite variance signifies the emergence of autonomous intelligence. Contribution/Results: This work pioneers the translation of natural selection into a computable intelligence scale. Theoretically, it reveals criticality in human tasks and identifies superficial pattern imitationārather than mechanistic understandingāas the core barrier to AI autonomy. Through probabilistic modeling, criticality analysis, multi-task empirical evaluation, and parameter extrapolation, we find mainstream AI systems fall significantly short of autonomous performance across vision, search, recommendation, and language tasks. Achieving general autonomy may require ~10²ⶠparametersāprojected to take ~70 years under current Mooreās Law trendsānecessitating a paradigm shift.
š Abstract
How does intelligence emerge? We propose that intelligence is not a sudden gift or random occurrence, but rather a necessary trait for species to survive through Natural Selection. If a species passes the test of Natural Selection, it demonstrates the intelligence to survive in nature. Extending this perspective, we introduce Intelligence Test, a method to quantify the intelligence of any subject on any task. Like how species evolve by trial and error, Intelligence Test quantifies intelligence by the number of failed attempts before success. Fewer failures correspond to higher intelligence. When the expectation and variance of failure counts are both finite, it signals the achievement of an autonomous level of intelligence. Using Intelligence Test, we comprehensively evaluate existing AI systems. Our results show that while AI systems achieve a level of autonomy in simple tasks, they are still far from autonomous in more complex tasks, such as vision, search, recommendation, and language. While scaling model size might help, this would come at an astronomical cost. Projections suggest that achieving general autonomy would require unimaginable $10^{26}$ parameters. Even if Moore's Law continuously holds, such a parameter scale would take $70$ years. This staggering cost highlights the complexity of human tasks and the inadequacies of current AI. To further understand this phenomenon, we conduct a theoretical analysis. Our simulations suggest that human tasks possess a criticality property. As a result, autonomy requires a deep understanding of the task's underlying mechanisms. Current AI, however, does not fully grasp these mechanisms and instead relies on superficial mimicry, making it difficult to reach an autonomous level. We believe Intelligence Test can not only guide the future development of AI but also offer profound insights into the intelligence of humans ourselves.