A Case for Computing on Unstructured Data

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Traditional data systems struggle to process unstructured data—such as text, images, audio, and video—efficiently, due to the absence of native computational support. To address this, we propose MXFlow, the first three-stage bidirectional computing paradigm explicitly designed for unstructured data: (1) latent structure extraction, (2) structured dataflow transformation, and (3) multimodal projection of results back into unstructured forms. This paradigm preserves semantic richness while endowing unstructured data with computational tractability. MXFlow integrates latent structural modeling, streaming transformations, and cross-modal projection to enable joint, multi-type data analysis. Extensive experiments on two real-world application scenarios demonstrate its effectiveness, validate the necessity of each core component, and establish MXFlow as a unified framework for next-generation data systems natively supporting unstructured data computation.

Technology Category

Application Category

📝 Abstract

Unstructured data, such as text, images, audio, and video, comprises the vast majority of the world's information, yet it remains poorly supported by traditional data systems that rely on structured formats for computation. We argue for a new paradigm, which we call computing on unstructured data, built around three stages: extraction of latent structure, transformation of this structure through data processing techniques, and projection back into unstructured formats. This bi-directional pipeline allows unstructured data to benefit from the analytical power of structured computation, while preserving the richness and accessibility of unstructured representations for human and AI consumption. We illustrate this paradigm through two use cases and present the research components that need to be developed in a new data system called MXFlow.

Problem

Research questions and friction points this paper is trying to address.

Computing on unstructured data like text and images

Extracting latent structure from unstructured information

Bridging structured computation with unstructured representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracting latent structure from unstructured data

Transforming structure via data processing techniques

Projecting results back into unstructured formats

🔎 Similar Papers

No similar papers found.