SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the entanglement of parametric knowledge memory and genuine reasoning capabilities in large language models (LLMs). Method: We propose SynthWorlds—a framework that constructs two structurally identical but semantically disjoint synthetic parallel corpora, coupled with mirrored multi-hop question answering and page navigation tasks. By rigorously controlling reasoning complexity, we ensure parametric knowledge fails in the target world. Knowledge–reasoning decoupling is achieved via closed-book QA, retrieval-augmented evaluation, and automated synthetic world generation. Contribution/Results: Experiments reveal a stable, measurable “knowledge advantage gap” across models—demonstrating persistent reliance on memorized knowledge despite existing mitigation strategies. This work establishes the first quantitative, reproducible evaluation paradigm for reasoning capability, enabling fine-grained, interpretable analysis of reasoning mechanisms and paving the way for targeted architectural and training optimizations.

Technology Category

Application Category

📝 Abstract

Evaluating the reasoning ability of language models (LMs) is complicated by their extensive parametric world knowledge, where benchmark performance often reflects factual recall rather than genuine reasoning. Existing datasets and approaches (e.g., temporal filtering, paraphrasing, adversarial substitution) cannot cleanly separate the two. We present SynthWorlds, a framework that disentangles task reasoning complexity from factual knowledge. In SynthWorlds, we construct parallel corpora representing two worlds with identical interconnected structure: a real-mapped world, where models may exploit parametric knowledge, and a synthetic-mapped world, where such knowledge is meaningless. On top of these corpora, we design two mirrored tasks as case studies: multi-hop question answering and page navigation, which maintain equal reasoning difficulty across worlds. Experiments in parametric-only (e.g., closed-book QA) and knowledge-augmented (e.g., retrieval-augmented) LM settings reveal a persistent knowledge advantage gap, defined as the performance boost models gain from memorized parametric world knowledge. Knowledge acquisition and integration mechanisms reduce but do not eliminate this gap, highlighting opportunities for system improvements. Fully automatic and scalable, SynthWorlds provides a controlled environment for evaluating LMs in ways that were previously challenging, enabling precise and testable comparisons of reasoning and memorization.

Problem

Research questions and friction points this paper is trying to address.

Disentangling reasoning ability from factual knowledge in language models

Evaluating genuine reasoning versus parametric knowledge memorization

Creating controlled parallel worlds for precise LM capability assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel corpora with real and synthetic worlds

Mirrored tasks maintaining equal reasoning difficulty

Automatic scalable framework for controlled LM evaluation

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting