Prior Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods

📅 2025-10-18

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This paper investigates the theoretical mechanisms underlying collaborative reasoning between pretrained priors and external information (e.g., retrieval-augmented generation, tool calling) during test-time enhancement of large language models. We propose a knowledge-graph-based modeling framework that formalizes multi-step reasoning as a source-to-target connectivity problem, and introduce sublinear graph algorithms to characterize the relationship between prior knowledge density and oracle query efficiency. We establish, for the first time, a phase-transition phenomenon in knowledge graphs: when prior knowledge density exceeds a critical threshold—inducing a giant connected component—the expected number of queries required for successful reasoning becomes constant; below this threshold, the query complexity lower bound is Ω(√n). This result quantifies the minimal pretrained knowledge volume necessary for efficient test-time reasoning and provides a verifiable theoretical foundation for designing knowledge-aware augmentation strategies.

Technology Category

Application Category

📝 Abstract

Test-time augmentation, such as Retrieval-Augmented Generation (RAG) or tool use, critically depends on an interplay between a model's parametric knowledge and externally retrieved information. However, the theoretical underpinnings of this relationship remain poorly understood. Specifically, it is not clear how much pre-training knowledge is required to answer queries with a small number of augmentation steps, which is a desirable property in practice. To address this question, we formulate multi-step reasoning as an $s$-$t$ connectivity problem on a knowledge graph. We represent a model's pre-training parametric knowledge as a partial, potentially noisy subgraph. We view augmentation as querying an oracle for true edges that augment the model's knowledge. Then, we characterize the necessary and sufficient number of augmentation steps for the model to generate an accurate answer given partial prior knowledge. One key result shows a phase transition: if the prior knowledge graph over $n$ vertices is disconnected into small components, then finding a path via augmentation is inefficient and requires $Ω(sqrt{n})$ queries. On the other hand, once the density of correct knowledge surpasses a threshold, forming a giant component, we can find paths with an expected constant number of queries.

Problem

Research questions and friction points this paper is trying to address.

Modeling test-time augmentation as connectivity on knowledge graphs

Analyzing required pre-training knowledge for efficient query answering

Characterizing phase transition in augmentation efficiency based on prior knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Models multi-step reasoning as graph connectivity problem

Represents pre-training knowledge as partial noisy subgraph

Characterizes augmentation steps needed given prior knowledge

🔎 Similar Papers

GraphArena: Evaluating and Exploring Large Language Models on Graph Computation