🤖 AI Summary
Current large language models (LLMs) struggle to reliably predict the long-term scholarly impact of research papers, as they rely solely on static textual content and overlook the dynamic evolution of scientific knowledge. To address this limitation, this work proposes FAME, a novel framework that introduces a continuous-time manifold evolution mechanism. By integrating paper text with knowledge flow graphs, FAME constructs a dynamic latent space to model the spatiotemporal trajectories of scientific topics. The approach reframes impact prediction as a verifiable forward-looking proxy task and effectively incorporates LLMs within this evolving representation. Experiments on 3,200 arXiv papers demonstrate that FAME significantly outperforms state-of-the-art LLM-based evaluators and substantially enhances their predictive performance for scholarly impact.
📝 Abstract
Large Language Models (LLMs) are increasingly used to brainstorm and evaluate research ideas, yet assessing such judgments is fundamentally difficult because the true impact of a new idea may take years to emerge. We address this challenge by using the impact forecasting of human-authored manuscripts as a verifiable proxy task. In a prospective forecasting study, we find that frontier LLMs fail to reliably distinguish high-impact papers from ordinary publications, suggesting that static text-based judging is insufficient for scientific evaluation. To address this limitation, we propose $\textbf{FAME}$ ($\underline{\text{F}}$orecasting $\underline{\text{A}}$cademic Impact via Continuous-Time $\underline{\text{M}}$anifold $\underline{\text{E}}$volution), a spatiotemporal framework for modeling the dynamic trajectories of scientific topics. FAME projects papers into a dynamic latent space informed by textual features and a verified knowledge-flow graph, learning geometric constraints that align impactful manuscripts with the forward momentum of their fields. Experiments on 3,200 arXiv papers across three fast-evolving subfields show that FAME consistently and substantially outperforms state-of-the-art LLM evaluators in prospective multidimensional impact forecasting. Furthermore, integrating FAME's dynamic geometric signals into LLMs significantly improves their forecasting performance. These results support manuscript impact forecasting as a useful, measurable proxy benchmark and position FAME as a strong, trajectory-aware foundation for automated scientific evaluation.