DeepInnovation AI: A Global Dataset Mapping the AI innovation from Academic Research to Industrial Patents

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the opacity of academic-to-industrial technology transfer in AI and the fragmentation of innovation-related data. To this end, we construct the first global AI innovation mapping dataset, featuring a novel tripartite architecture integrating patents, scholarly papers, and cross-modal semantic similarity. We propose a hypergraph-based innovation quantification framework and DeepCosineAI—a scalable paper-patent matching method leveraging multilingual large language models, hierarchical BERT classification, and cosine similarity computation—aligned across one million semantically grounded pairs. The dataset encompasses 2.35 million patents and 3.51 million papers, yielding approximately 100 million high-quality semantic matches. Our approach enables the first dynamic, cross-national, and longitudinal quantitative analysis of AI technology transfer, offering an extensible infrastructure for evidence-based innovation policy design and industry–academia–research collaboration.

Technology Category

Application Category

📝 Abstract
In the rapidly evolving field of artificial intelligence (AI), mapping innovation patterns and understanding effective technology transfer from research to applications are essential for economic growth. However, existing data infrastructures suffer from fragmentation, incomplete coverage, and insufficient evaluative capacity. Here, we present DeepInnovationAI, a comprehensive global dataset containing three structured files. DeepPatentAI.csv: Contains 2,356,204 patent records with 8 field-specific attributes. DeepDiveAI.csv: Encompasses 3,511,929 academic publications with 13 metadata fields. These two datasets leverage large language models, multilingual text analysis and dual-layer BERT classifiers to accurately identify AI-related content, while utilizing hypergraph analysis to create robust innovation metrics. Additionally, DeepCosineAI.csv: By applying semantic vector proximity analysis, this file presents approximately one hundred million calculated paper-patent similarity pairs to enhance understanding of how theoretical advancements translate into commercial technologies. DeepInnovationAI enables researchers, policymakers, and industry leaders to anticipate trends and identify collaboration opportunities. With extensive temporal and geographical scope, it supports detailed analysis of technological development patterns and international competition dynamics, establishing a foundation for modeling AI innovation and technology transfer processes.
Problem

Research questions and friction points this paper is trying to address.

Mapping AI innovation patterns from academia to industry
Bridging the gap between academic papers and industrial patents
Enhancing understanding of technology transfer with comprehensive data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses large language models for AI content identification
Applies hypergraph analysis for robust innovation metrics
Employs semantic vector proximity for paper-patent similarity
H
Haixing Gong
Shanghai Artificial Intelligence Laboratory, Shanghai 200232, P. R. China; Department of Atmospheric and Oceanic Sciences & Institute of Atmospheric Sciences, Fudan University, Shanghai, 200438, P. R. China
Hui Zou
Hui Zou
University of Minnesota
Statistics
X
Xingzhou Liang
Shanghai Artificial Intelligence Laboratory, Shanghai 200232, P. R. China
S
Shiyuan Meng
Shanghai Artificial Intelligence Laboratory, Shanghai 200232, P. R. China
Pinlong Cai
Pinlong Cai
Shanghai Artificial Intelligence Laboratory
Artificial IntelligenceDecision IntelligenceKnowledge Systems
X
Xingcheng Xu
Shanghai Artificial Intelligence Laboratory, Shanghai 200232, P. R. China
J
Jingjing Qu
Shanghai Artificial Intelligence Laboratory, Shanghai 200232, P. R. China