🤖 AI Summary
Current AI systems lack a theory-driven, quantitative framework for evaluating symbolic reasoning capabilities, hindering rigorous assessment of interpretability and reliability. Method: This paper pioneers the systematic integration of algebraic circuit complexity theory into AI science, modeling symbolic reasoning tasks as algebraic circuits to define computable, generative, and scalable benchmarks; it introduces a theory-driven benchmark design paradigm grounded in symbolic expression modeling and scalable synthetic data generation. Contribution/Results: We establish the first falsifiable, scale-invariant metric for symbolic reasoning ability. Our framework overcomes the theoretical shallowness of existing empirical benchmarks, providing a rigorous, reproducible, and extensible foundation for evaluating AI’s symbolic generalization capacity—enabling principled analysis of expressivity, learnability, and computational efficiency in symbolic AI systems.
📝 Abstract
The rapid development of modern artificial intelligence (AI) systems has created an urgent need for their scientific quantification. While their fluency across a variety of domains is impressive, modern AI systems fall short on tests requiring symbolic processing and abstraction - a glaring limitation given the necessity for interpretable and reliable technology. Despite a surge of reasoning benchmarks emerging from the academic community, no comprehensive and theoretically-motivated framework exists to quantify reasoning (and more generally, symbolic ability) in AI systems. Here, we adopt a framework from computational complexity theory to explicitly quantify symbolic generalization: algebraic circuit complexity. Many symbolic reasoning problems can be recast as algebraic expressions. Thus, algebraic circuit complexity theory - the study of algebraic expressions as circuit models (i.e., directed acyclic graphs) - is a natural framework to study the complexity of symbolic computation. The tools of algebraic circuit complexity enable the study of generalization by defining benchmarks in terms of their complexity-theoretic properties (i.e., the difficulty of a problem). Moreover, algebraic circuits are generic mathematical objects; for a given algebraic circuit, an arbitrarily large number of samples can be generated for a specific circuit, making it an optimal testbed for the data-hungry machine learning algorithms that are used today. Here, we adopt tools from algebraic circuit complexity theory, apply it to formalize a science of symbolic generalization, and address key theoretical and empirical challenges for its successful application to AI science and its impact on the broader community.