🤖 AI Summary
Current AI models are predominantly single-task and single-modal “intelligence silos,” exhibiting poor generalization and severely hindering progress toward Artificial General Intelligence (AGI). To address this, we propose Pangaea—a unified multi-task, multi-modal AI framework that constitutes the first systematic effort to dismantle barriers across tasks and modalities, forming an “AI supercontinent.” Our approach unifies 296 heterogeneous multimodal datasets via a shared encoding scheme and performs joint pretraining, thereby revealing the modality expansion effect and quantifying the accumulation law of cross-modal general knowledge. Pangaea achieves state-of-the-art performance on 45 general-purpose benchmarks and 15 scientific tasks, significantly outperforming existing methods. These results empirically validate the feasibility of broad generalization driven by knowledge fusion and establish a scalable new paradigm for AGI development.
📝 Abstract
The pursuit of artificial general intelligence continuously demands generalization in one model across myriad tasks, even those not seen before. However, current AI models are isolated from each other for being limited to specific tasks, now first defined as Intelligence Islands. To unify Intelligence Islands into one, we propose Pangaea, the first AI supercontinent akin to the geological Pangaea. Pangaea encodes any data into a unified format and accumulates universal knowledge through pre-training on 296 datasets across diverse modalities. Eventually, it demonstrates remarkable generalization across 45 general tasks and 15 scientific tasks encompassing a wide range of scientific subjects. By investigating Pangaea deeper, the scaling effect of modality is revealed, quantifying the universal knowledge accumulation across modalities as the cumulative distribution function of a geometric distribution. On the whole, Pangaea shows strong potential to handle myriad tasks, indicating a new direction toward artificial general intelligence.