🤖 AI Summary
This work addresses the challenge of quantifying information structure in complex data—particularly human communication and extraterrestrial signals—where conventional complexity measures fail to capture semantic communicability. We propose Local Compositional Complexity (LCC), a novel complexity metric grounded in communicability as its fundamental criterion. Built upon the Minimum Description Length principle, LCC decomposes the shortest description of data into structured (composable, generalizable) and unstructured (random or repetitive) components, thereby characterizing its capacity to convey human-interpretable semantics. Our method integrates structural decomposition modeling, local pattern discovery, and cross-modal feature abstraction, yielding the first computable, modality-agnostic (applicable to speech, images, and text) measure of information structure. Experiments demonstrate that LCC effectively discriminates semantically meaningful signals from noise or repetitive artifacts across multimodal tasks. This provides both a new theoretical framework and a practical tool for characterizing macroscopic physical systems and performing semantic discrimination of potential extraterrestrial messages in SETI.
📝 Abstract
Data complexity is an important concept in the natural sciences and related areas, but lacks a rigorous and computable definition. In this paper, we focus on a particular sense of complexity that is high if the data is structured in a way that could serve to communicate a message. In this sense, human speech, written language, drawings, diagrams and photographs are high complexity, whereas data that is close to uniform throughout or populated by random values is low complexity. We describe a general framework for measuring data complexity based on dividing the shortest description of the data into a structured and an unstructured portion, and taking the size of the former as the complexity score. We outline an application of this framework in statistical mechanics that may allow a more objective characterisation of the macrostate and entropy of a physical system. Then, we derive a more precise and computable definition geared towards human communication, by proposing local compositionality as an appropriate specific structure. We demonstrate experimentally that this method can distinguish meaningful signals from noise or repetitive signals in auditory, visual and text domains, and could potentially help determine whether an extra-terrestrial signal contained a message.