CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study empirically investigates how scientists recombine scientific concepts across domains to stimulate innovation and support supervised cross-domain hypothesis generation. Method: We introduce “scientific concept recombination” as a novel cognitive modeling task and construct CHIMERA—the first large-scale, publicly available knowledge base—by automatically extracting and manually refining over 28,000 real-world recombination instances from AI-domain paper abstracts. We design an LLM-augmented information extraction framework for high-precision identification of concept pairs and their source domains, conduct statistical analyses to characterize cross-domain recombination patterns, and train supervised models to generate verifiable scientific hypotheses. Contribution/Results: Experiments demonstrate that hypotheses generated by our model receive strong validation and endorsement from domain experts. All code, data, and the CHIMERA knowledge base are fully open-sourced to advance AI-augmented scientific discovery research.

Technology Category

Application Category

📝 Abstract
A hallmark of human innovation is the process of recombination -- creating original ideas by integrating elements of existing mechanisms and concepts. In this work, we automatically mine the scientific literature and build CHIMERA: a large-scale knowledge base (KB) of recombination examples. CHIMERA can be used to empirically explore at scale how scientists recombine concepts and take inspiration from different areas, or to train supervised machine learning models that learn to predict new creative cross-domain directions. To build this KB, we present a novel information extraction task of extracting recombination from scientific paper abstracts, collect a high-quality corpus of hundreds of manually annotated abstracts, and use it to train an LLM-based extraction model. The model is applied to a large corpus of papers in the AI domain, yielding a KB of over 28K recombination examples. We analyze CHIMERA to explore the properties of recombination in different subareas of AI. Finally, we train a scientific hypothesis generation model using the KB, which predicts new recombination directions that real-world researchers find inspiring. Our data and code are available at https://github.cs.huji.ac.il/tomhope-lab/CHIMERA
Problem

Research questions and friction points this paper is trying to address.

Mining scientific literature for idea recombination examples
Building a knowledge base to explore cross-domain inspiration
Training models to predict creative scientific directions
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based extraction model for recombination examples
Large-scale knowledge base with 28K examples
Scientific hypothesis generation model training
🔎 Similar Papers
No similar papers found.
N
Noy Sternlicht
School of Computer Science and Engineering, The Hebrew University of Jerusalem
Tom Hope
Tom Hope
Independent Researcher
Sociology of communityHuman-Computer InteractionUser ExperienceLGBTQ