🤖 AI Summary
This study addresses the tendency of large language models (LLMs) to generate hallucinated citations, exhibit coverage bias, and lack principled thematic organization when synthesizing scientific literature. To mitigate these issues, the authors propose a hybrid workflow that leverages bibliometric algorithms to produce auditable clustering structures, which in turn guide LLMs in generating semantically coherent cluster descriptions. Evaluated on a Scopus dataset using multidimensional criteria—including human alignment, semantic coverage, and cluster quality—the approach significantly enhances the reliability and semantic fidelity of literature reviews. The results demonstrate that without structural guidance, LLMs struggle to accurately infer meaningful clusters, whereas the integration of bibliometric scaffolding markedly improves their performance.
📝 Abstract
Large language models (LLMs) can support scientific literature synthesis, but remain prone to hallucinated references, uneven coverage, and weakly grounded thematic organization. We evaluate whether bibliometric structure improves LLM-assisted synthesis by comparing six pipelines for generating cluster descriptions under different levels of evidence and structure. Using 100 published bibliometric analyses, we reconstruct Scopus corpora, extract human-written cluster descriptions, and assess outputs by human alignment, semantic coverage, clustering quality, graph quality, and reference grounding. Results show that LLMs produce descriptions semantically close to human-written ones, but are unreliable when asked to infer bibliometric structure from scratch. Performance improves when bibliometric algorithms define the clusters and the LLM interprets them. Overall, LLM-assisted bibliometric synthesis is most promising as a hybrid workflow in which algorithms provide auditable structure and LLMs generate readable descriptions.