Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

📅 2026-01-27
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of developing models that simultaneously exhibit strong scientific reasoning and general multimodal capabilities without relying on massive domain-specific datasets. The authors propose a data-efficient, transparent, and fully reproducible end-to-end training paradigm comprising high-quality scientific data curation, supervised fine-tuning, and reinforcement learning. Using fewer than five million samples, the resulting model significantly reduces dependence on large-scale pretraining data while achieving performance on scientific reasoning tasks comparable to that of much larger models. Moreover, it remains competitive on standard vision and multimodal benchmarks, demonstrating that scientific intelligence and general-purpose capabilities can effectively coexist within a single architecture.

Technology Category

Application Category

📝 Abstract
We present Innovator-VL, a scientific multimodal large language model designed to advance understanding and reasoning across diverse scientific domains while maintaining excellent performance on general vision tasks. Contrary to the trend of relying on massive domain-specific pretraining and opaque pipelines, our work demonstrates that principled training design and transparent methodology can yield strong scientific intelligence with substantially reduced data requirements. (i) First, we provide a fully transparent, end-to-end reproducible training pipeline, covering data collection, cleaning, preprocessing, supervised fine-tuning, reinforcement learning, and evaluation, along with detailed optimization recipes. This facilitates systematic extension by the community. (ii) Second, Innovator-VL exhibits remarkable data efficiency, achieving competitive performance on various scientific tasks using fewer than five million curated samples without large-scale pretraining. These results highlight that effective reasoning can be achieved through principled data selection rather than indiscriminate scaling. (iii) Third, Innovator-VL demonstrates strong generalization, achieving competitive performance on general vision, multimodal reasoning, and scientific benchmarks. This indicates that scientific alignment can be integrated into a unified model without compromising general-purpose capabilities. Our practices suggest that efficient, reproducible, and high-performing scientific multimodal models can be built even without large-scale data, providing a practical foundation for future research.
Problem

Research questions and friction points this paper is trying to address.

multimodal large language model
scientific discovery
data efficiency
reproducible training
scientific reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

data efficiency
transparent training pipeline
scientific multimodal reasoning
curated dataset
generalization
Zichen Wen
Zichen Wen
Shanghai Jiao Tong University
Efficient AITrustworthy AILarge Language ModelMachine Learning
B
Boxue Yang
School of Artificial Intelligence, Shanghai Jiao Tong University
Shuang Chen
Shuang Chen
Associate Professor, Kuang Yaming Honors School, Nanjing University
Computational ChemistryMemristorsOrganic FerroelecticsNanomaterials
Y
Yaojie Zhang
School of Artificial Intelligence, Shanghai Jiao Tong University
Yuhang Han
Yuhang Han
Northwestern Polytechnical University
Event-based taskEffcient MLLM
J
Junlong Ke
School of Artificial Intelligence, Shanghai Jiao Tong University
C
Cong Wang
School of Artificial Intelligence, Shanghai Jiao Tong University
Yicheng Fu
Yicheng Fu
Stanford University
Natural Language ProcessingLarge Language Model
J
Jiawang Zhao
School of Artificial Intelligence, Shanghai Jiao Tong University
Jiangchao Yao
Jiangchao Yao
Shanghai Jiao Tong University
Machine Learning
X
Xi Fang
DP Technology
Z
Zhen Wang
DP Technology
H
Henxing Cai
DP Technology
L
Lin Yao
DP Technology
Zhifeng Gao
Zhifeng Gao
DP Technology
Data MiningMachine LearningAI for ScienceAI for Industry
Y
Yanhui Hong
DP Technology
N
Nang Yuan
DP Technology
Y
Yixuan Li
DP Technology
Guojiang Zhao
Guojiang Zhao
DP Technology; Carnegie Mellon University;
LLMAI For Science
H
Haoyi Tao
DP Technology
N
Nan Wang
DP Technology
H
Han Lyu
DP Technology
Guolin Ke
Guolin Ke
DP Technology
Machine LearningAI for Science
Ning Liao
Ning Liao
Shanghai Jiao Tong University
LLMMLLMMoE
Xiaoxing Wang
Xiaoxing Wang
SJTU
Machine LearningAutoMLNeural Architecture Search
K
Kai Chen
MemTensor
Zhiyu Li
Zhiyu Li
Tianjin University
Robust controlattitude control
Feiyu Xiong
Feiyu Xiong
MemTensor (Shanghai) Technology Co., Ltd.
Machine LearningNLPLLM
S
Sihan Hu
Institute of Theoretical Physics, Chinese Academy of Sciences
K
Kun Chen
Institute of Theoretical Physics, Chinese Academy of Sciences
Yanfeng Wang
Yanfeng Wang
Shanghai Jiao Tong University
E
E. Weinan
School of Artificial Intelligence, Shanghai Jiao Tong University
Linfeng Zhang
Linfeng Zhang
DP Technology; AI for Science Institute
AI for Sciencemulti-scale modelingmolecular simulationdrug/materials design