🤖 AI Summary
This work investigates whether language inherently possesses non-flat semantic structure, independent of geometric properties induced by embedding spaces. To this end, the authors propose Texture—a text-native, word-level discrete curvature signal that integrates left and right contextual beliefs around a masked token via a Schrödinger bridge formulation, thereby delineating regions of positive curvature (semantic focus) and negative curvature (semantic divergence). This approach establishes, for the first time, the existence of intrinsic curvature within textual data itself and provides a general framework to measure and leverage such curvature without requiring geometric training. Experiments demonstrate that the curvature signal effectively guides mechanisms for long-context compression and routing in retrieval-augmented generation, highlighting its potential as a universal control primitive.
📝 Abstract
Does text have an intrinsic curvature? Language is increasingly modeled in curved geometries - hyperbolic spaces for hierarchy, mixed-curvature manifolds for compositional structure - yet a basic scientific question remains unresolved: what does curvature mean for text itself, in a way that is native to language rather than an artifact of the embedding space we choose? We argue that text does indeed have curvature, and show how to detect it, define it, and use it. To this end, we propose Texture, a text-native, word-level discrete curvature signal, and make three contributions. (a) Existence: We provide empirical and theoretical certificates that semantic inference in natural corpora is non-flat, i.e. language has inherent curvature. (b) Definition: We define Texture by reconciling left- and right-context beliefs around a masked word through a Schrodinger bridge, yielding a curvature field that is positive where context focuses meaning and negative where it fans out into competing continuations. (c) Utility: Texture is actionable: it serves as a general-purpose measurement and control primitive enabling geometry without geometric training; we instantiate it on two representative tasks, improving long-context inference through curvature-guided compression and retrieval-augmented generation through curvature-guided routing. Together, our results establish a text-native curvature paradigm, making curvature measurable and practically useful.