Context Unrolling in Omni Models

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This work addresses the challenge of unified modeling of heterogeneous multimodal data by proposing Omni, a novel architecture that enables native end-to-end joint training across text, images, video, 3D geometry, and implicit representations for the first time. The model introduces a context-unfolding mechanism that explicitly reasons about and aggregates complementary cross-modal information to approximate a shared multimodal knowledge manifold. By leveraging a unified architecture and representation alignment, Omni enhances cross-modal consistency and generation fidelity. Experimental results demonstrate that Omni achieves state-of-the-art performance on multimodal understanding and generation benchmarks, supporting coherent joint generation of diverse modalities within a unified context and exhibiting superior cross-modal reasoning capabilities.

Technology Category

Application Category

📝 Abstract
We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing predictions. This process enables the model to aggregate complementary information across heterogeneous modalities, facilitating a more faithful approximation of the shared multimodal knowledge manifold and improving downstream reasoning fidelity. As a result, Omni achieves strong performance on both multimodal generation and understanding benchmarks, while demonstrating advanced multimodal reasoning capabilities, including in-context generation of text, image, video, and 3D geometry.
Problem

Research questions and friction points this paper is trying to address.

multimodal integration
heterogeneous modalities
shared knowledge manifold
context unrolling
multimodal reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context Unrolling
Unified Multimodal Model
Multimodal Reasoning
Cross-modal Aggregation
Omni Model