🤖 AI Summary
Large language models exhibit poor robustness and weak foundational reasoning in real-world scenarios, primarily due to the absence of innate “core knowledge”—evolutionarily conserved cognitive structures that underpin incremental cognitive development in humans.
Method: This paper pioneers the systematic integration of developmental psychology’s core knowledge theory into AI design, proposing *cognitive prototyping*: generating large-scale multimodal synthetic data from cognitive prototypes, coupled with developmentally constrained curriculum learning and cross-modal representation alignment to explicitly embed core knowledge into model training. The approach is architecture-agnostic and engineering-feasible.
Contribution/Results: Experiments demonstrate substantial improvements in robustness and generalization across fundamental tasks—including commonsense reasoning, physical prediction, and causal understanding—without architectural modification. The paradigm provides both a philosophical foundation and a concrete technical pathway toward next-generation AI systems capable of emergent abilities.
📝 Abstract
Despite excelling in high-level reasoning, current language models lack robustness in real-world scenarios and perform poorly on fundamental problem-solving tasks that are intuitive to humans. This paper argues that both challenges stem from a core discrepancy between human and machine cognitive development. While both systems rely on increasing representational power, the absence of core knowledge-foundational cognitive structures in humans-prevents language models from developing robust, generalizable abilities, where complex skills are grounded in simpler ones within their respective domains. It explores empirical evidence of core knowledge in humans, analyzes why language models fail to acquire it, and argues that this limitation is not an inherent architectural constraint. Finally, it outlines a workable proposal for systematically integrating core knowledge into future multi-modal language models through the large-scale generation of synthetic training data using a cognitive prototyping strategy.