Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

📅 2026-03-26

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This work proposes the “specializable generalist” paradigm to develop an ultra-large-scale multimodal foundation model that simultaneously achieves strong general capabilities and deep scientific expertise. For the first time, it enables trillion-parameter-scale scientific multimodal modeling, integrating over one hundred interdisciplinary scientific tasks and incorporating advanced agent functionalities. Built upon the XTuner and LMDeploy infrastructure, the system supports efficient reinforcement learning training while ensuring high-fidelity consistency between training and inference. The resulting model attains state-of-the-art performance among open-source models on general benchmarks and significantly outperforms existing closed-source models across specialized domains, including chemistry, materials science, life sciences, and earth sciences.

Technology Category

Application Category

📝 Abstract

We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertise has been vastly expanded to master over 100 specialized tasks across critical science fields, including chemistry, materials, life sciences, and earth sciences. Achieving this massive scale is made possible by the robust infrastructure support of XTuner and LMDeploy, which facilitates highly efficient Reinforcement Learning (RL) training at the 1-trillion parameter level while ensuring strict precision consistency between training and inference. By seamlessly integrating these advancements, Intern-S1-Pro further fortifies the fusion of general and specialized intelligence, working as a Specializable Generalist, demonstrating its position in the top tier of open-source models for general capabilities, while outperforming proprietary models in the depth of specialized scientific tasks.

Problem

Research questions and friction points this paper is trying to address.

scientific multimodal foundation model

trillion-scale AI

specialized scientific tasks

generalist-specialist integration

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

trillion-scale model

scientific multimodal foundation model

reinforcement learning training