When do neural networks learn world models?

๐Ÿ“… 2025-02-13
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper investigates whether neural networks in multi-task learning can provably recover the true linear latent-variable world model underlying data generation, particularly under nonlinear observables. Method: We establish the first sufficient conditions for verifiable recovery of linear latent variables from nonlinear observations. Introducing a novel Fourierโ€“Walsh analytic framework, we develop Boolean reversible transformations to characterize how low-degree bias and architectural sensitivity govern latent identifiability. Contribution/Results: Under mild assumptions, we prove that models exhibiting low-degree bias exactly recover the generative latent variables; moreover, architectural choices fundamentally determine success or failure in world-model learning. Our analysis provides rigorous theoretical foundations for self-supervised learning, out-of-distribution generalization, and the linear representation hypothesis in large language models.

Technology Category

Application Category

๐Ÿ“ Abstract
Humans develop world models that capture the underlying generation process of data. Whether neural networks can learn similar world models remains an open problem. In this work, we provide the first theoretical results for this problem, showing that in a multi-task setting, models with a low-degree bias provably recover latent data-generating variables under mild assumptions -- even if proxy tasks involve complex, non-linear functions of the latents. However, such recovery is also sensitive to model architecture. Our analysis leverages Boolean models of task solutions via the Fourier-Walsh transform and introduces new techniques for analyzing invertible Boolean transforms, which may be of independent interest. We illustrate the algorithmic implications of our results and connect them to related research areas, including self-supervised learning, out-of-distribution generalization, and the linear representation hypothesis in large language models.
Problem

Research questions and friction points this paper is trying to address.

Neural networks learn world models
Theoretical results for multi-task settings
Sensitivity to model architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-task setting recovery
Fourier-Walsh transform analysis
invertible Boolean transforms
๐Ÿ”Ž Similar Papers
No similar papers found.
Tianren Zhang
Tianren Zhang
Tsinghua University
Representation learningGeneralizationLearning theoryReinforcement learningMachine learning
G
Guanyu Chen
Department of Automation, Tsinghua University, Beijing, China
F
Feng Chen
Department of Automation, Tsinghua University, Beijing, China