🤖 AI Summary
Current machine learning models suffer from limited interpretability and insufficient capacity to generate scientific insights. To address this, we propose Discovery Engine—a fully automated, end-to-end scientific discovery system that integrates multi-source heterogeneous data modeling with state-of-the-art interpretability techniques, including causal inference, concept activation mapping, and symbolic induction. We evaluate the framework across four domains—medicine, materials science, social sciences, and environmental science—using five published benchmark studies. Discovery Engine matches or substantially outperforms SOTA methods in predictive accuracy. Crucially, it systematically generates high-level scientific outputs: mechanistic explanations, empirically testable hypotheses, and actionable intervention strategies. Results demonstrate that the framework not only enhances model trustworthiness but also enables the practical realization of an “interpretability-driven scientific discovery” paradigm, establishing a new benchmark for automated scientific discovery.
📝 Abstract
The Discovery Engine is a general purpose automated system for scientific discovery, which combines machine learning with state-of-the-art ML interpretability to enable rapid and robust scientific insight across diverse datasets. In this paper, we benchmark the Discovery Engine against five recent peer-reviewed scientific publications applying machine learning across medicine, materials science, social science, and environmental science. In each case, the Discovery Engine matches or exceeds prior predictive performance while also generating deeper, more actionable insights through rich interpretability artefacts. These results demonstrate its potential as a new standard for automated, interpretable scientific modelling that enables complex knowledge discovery from data.