🤖 AI Summary
This study investigates whether AI models genuinely comprehend human strategic concepts—specifically in chess—by examining whether internal representations align with expert-defined notions such as “central control” and “knight outpost.”
Method: We introduce the first expert-annotated Chess960 dataset to eliminate opening-book memorization effects, enabling memory-free evaluation of conceptual robustness. Using layer-wise linear probing and concept accuracy quantification, we analyze a 270M-parameter Transformer across all layers.
Contribution/Results: Early layers exhibit high concept encoding consistency (85% accuracy), but performance drops sharply in deeper layers (50–65%). Model strength declines uniformly by 10–20% on Chess960, confirming reliance on memorized patterns rather than abstract strategic understanding. This work provides the first empirical evidence of a fundamental misalignment between high-performing AI representations and human strategic cognition, exposing an intrinsic tension between raw task performance and interpretable, collaborative reasoning.
📝 Abstract
Do AI systems truly understand human concepts or merely mimic surface patterns? We investigate this through chess, where human creativity meets precise strategic concepts. Analyzing a 270M-parameter transformer that achieves grandmaster-level play, we uncover a striking paradox: while early layers encode human concepts like center control and knight outposts with up to 85% accuracy, deeper layers, despite driving superior performance, drift toward alien representations, dropping to 50-65% accuracy. To test conceptual robustness beyond memorization, we introduce the first Chess960 dataset: 240 expert-annotated positions across 6 strategic concepts. When opening theory is eliminated through randomized starting positions, concept recognition drops 10-20% across all methods, revealing the model's reliance on memorized patterns rather than abstract understanding. Our layer-wise analysis exposes a fundamental tension in current architectures: the representations that win games diverge from those that align with human thinking. These findings suggest that as AI systems optimize for performance, they develop increasingly alien intelligence, a critical challenge for creative AI applications requiring genuine human-AI collaboration. Dataset and code are available at: https://github.com/slomasov/ChessConceptsLLM.