Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This paper clarifies the dual notion of “systematicity” in machine learning: *behavioral systematicity* (input–output generalization) and *representational systematicity* (internal compositional structure), noting that current research routinely conflates them and that dominant language/vision benchmarks (e.g., SCAN, CLEVR) test almost exclusively the former. Methodologically, we reconstruct the evaluation framework using Hadley’s taxonomy, integrating conceptual analysis, benchmark meta-evaluation, and mechanistic interpretability techniques to conduct the first systematic investigation of compositional mechanisms within model representations. Our contributions are threefold: (1) a precise distinction and formal definition of the two systematicities; (2) an empirical demonstration that existing benchmarks critically lack coverage of representational systematicity; and (3) a novel evaluation paradigm for representational systematicity grounded in mechanistic interpretability—providing both theoretical criteria and a practical roadmap toward AI models exhibiting genuine structural generalization.

Technology Category

Application Category

📝 Abstract

A core aspect of compositionality, systematicity is a desirable property in ML models as it enables strong generalization to novel contexts. This has led to numerous studies proposing benchmarks to assess systematic generalization, as well as models and training regimes designed to enhance it. Many of these efforts are framed as addressing the challenge posed by Fodor and Pylyshyn. However, while they argue for systematicity of representations, existing benchmarks and models primarily focus on the systematicity of behaviour. We emphasize the crucial nature of this distinction. Furthermore, building on Hadley's (1994) taxonomy of systematic generalization, we analyze the extent to which behavioural systematicity is tested by key benchmarks in the literature across language and vision. Finally, we highlight ways of assessing systematicity of representations in ML models as practiced in the field of mechanistic interpretability.

Problem

Research questions and friction points this paper is trying to address.

Distinguishing behavioural vs representational systematicity in ML models

Assessing systematic generalization in language and vision benchmarks

Exploring methods to evaluate representational systematicity in models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distinguishes behavioural vs representational systematicity in models

Analyzes benchmarks for behavioural systematicity in ML

Proposes assessing representational systematicity via mechanistic interpretability

🔎 Similar Papers

No similar papers found.