A Case for Computing on Unstructured Data

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional data systems struggle to process unstructured data—such as text, images, audio, and video—efficiently, due to the absence of native computational support. To address this, we propose MXFlow, the first three-stage bidirectional computing paradigm explicitly designed for unstructured data: (1) latent structure extraction, (2) structured dataflow transformation, and (3) multimodal projection of results back into unstructured forms. This paradigm preserves semantic richness while endowing unstructured data with computational tractability. MXFlow integrates latent structural modeling, streaming transformations, and cross-modal projection to enable joint, multi-type data analysis. Extensive experiments on two real-world application scenarios demonstrate its effectiveness, validate the necessity of each core component, and establish MXFlow as a unified framework for next-generation data systems natively supporting unstructured data computation.

Technology Category

Application Category

📝 Abstract
Unstructured data, such as text, images, audio, and video, comprises the vast majority of the world's information, yet it remains poorly supported by traditional data systems that rely on structured formats for computation. We argue for a new paradigm, which we call computing on unstructured data, built around three stages: extraction of latent structure, transformation of this structure through data processing techniques, and projection back into unstructured formats. This bi-directional pipeline allows unstructured data to benefit from the analytical power of structured computation, while preserving the richness and accessibility of unstructured representations for human and AI consumption. We illustrate this paradigm through two use cases and present the research components that need to be developed in a new data system called MXFlow.
Problem

Research questions and friction points this paper is trying to address.

Computing on unstructured data like text and images
Extracting latent structure from unstructured information
Bridging structured computation with unstructured representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracting latent structure from unstructured data
Transforming structure via data processing techniques
Projecting results back into unstructured formats
🔎 Similar Papers
No similar papers found.
M
Mushtari Sadia
University of Michigan, Ann Arbor, USA
Amrita Roy Chowdhury
Amrita Roy Chowdhury
University of Michigan, Ann Arbor
CryptographyDifferential PrivacyPrivacy-Preserving Machine Learning
A
Ang Chen
University of Michigan, Ann Arbor, USA