Scalable Scientific Interest Profiling Using Large Language Models

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address the problem of outdated research interest profiles for scholars, this paper proposes a MeSH-based large language model (LLM) method for automated profile generation. Leveraging GPT-4o-mini, the approach extracts structured semantic features from PubMed literature to generate academic interest profiles characterized by high readability and strong semantic consistency. Compared with conventional abstract-based summarization, our method significantly improves conceptual accuracy and human readability: expert evaluation shows 77.78% of MeSH-derived profiles rated “good/excellent,” 93.44% achieved superior readability, and 67.86% of domain experts preferred the MeSH-based profiles; semantic similarity (BERTScore F1) reached 0.542—substantially outperforming the abstract baseline. This work presents the first systematic validation of a MeSH-driven LLM profiling paradigm, demonstrating both methodological innovation and practical efficacy in automated scholarly profile construction.

Technology Category

Application Category

📝 Abstract

Research profiles help surface scientists' expertise but are often outdated. We develop and evaluate two large language model-based methods to generate scientific interest profiles: one summarizing PubMed abstracts and one using Medical Subject Headings (MeSH) terms, and compare them with researchers' self-written profiles. We assembled titles, MeSH terms, and abstracts for 595 faculty at Columbia University Irving Medical Center; self-authored profiles were available for 167. Using GPT-4o-mini, we generated profiles and assessed them with automatic metrics and blinded human review. Lexical overlap with self-written profiles was low (ROUGE-L, BLEU, METEOR), while BERTScore indicated moderate semantic similarity (F1: 0.542 for MeSH-based; 0.555 for abstract-based). Paraphrased references yielded 0.851, highlighting metric sensitivity. TF-IDF Kullback-Leibler divergence (8.56 for MeSH-based; 8.58 for abstract-based) suggested distinct keyword choices. In manual review, 77.78 percent of MeSH-based profiles were rated good or excellent, readability was favored in 93.44 percent of cases, and panelists preferred MeSH-based over abstract-based profiles in 67.86 percent of comparisons. Overall, large language models can generate researcher profiles at scale; MeSH-derived profiles tend to be more readable than abstract-derived ones. Machine-generated and self-written profiles differ conceptually, with human summaries introducing more novel ideas.

Problem

Research questions and friction points this paper is trying to address.

Generating scientific interest profiles using large language models

Comparing MeSH-based and abstract-based profile generation methods

Evaluating machine-generated profiles against self-written researcher profiles

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs to generate scientific interest profiles

Comparing MeSH-based and abstract-based profile generation

Evaluating profiles with automatic metrics and human review

🔎 Similar Papers

Interesting Scientific Idea Generation using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders