LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Systematic trend analysis of research addressing large language model (LLM) limitations—such as reasoning failures, hallucination, and weak multilingual capabilities—remains lacking. Method: Leveraging 250,000 papers from ACL and arXiv (2022–2024), this work proposes the first conceptual framework and reproducible quantitative paradigm for “LLM Limitations Research” (LLLMS). It integrates keyword filtering, LLM-assisted classification, dual clustering via HDBSCAN and BERTopic, LlooM-based topic modeling, and expert validation to construct the first publicly annotated LLLMS corpus (14,648 papers). Contribution/Results: LLLMS constitutes over 30% of all LLM-related publications. Temporal analysis reveals divergent disciplinary emphases: arXiv exhibits a marked shift toward safety, controllability, and multimodality, whereas ACL consistently prioritizes foundational capability deficits. This framework enables scalable, interpretable, and domain-grounded analysis of LLM limitations research.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) research has grown rapidly, along with increasing concern about their limitations such as failures in reasoning, hallucinations, and limited multilingual capability. In this survey, we conduct a data-driven, semi-automated review of research on limitations of LLM (LLLMs) from 2022 to 2024 using a bottom-up approach. From a corpus of 250,000 ACL and arXiv papers, we identify 14,648 relevant papers using keyword filtering, LLM-based classification, validated against expert labels, and topic clustering (via two approaches, HDBSCAN+BERTopic and LlooM). We find that LLM-related research increases over fivefold in ACL and fourfold in arXiv. Since 2022, LLLMs research grows even faster, reaching over 30% of LLM papers by late 2024. Reasoning remains the most studied limitation, followed by generalization, hallucination, bias, and security. The distribution of topics in the ACL dataset stays relatively stable over time, while arXiv shifts toward safety and controllability (with topics like security risks, alignment, hallucinations, knowledge editing), and multimodality between 2022 and 2024. We release a dataset of annotated abstracts and a validated methodology, and offer a quantitative view of trends in LLM limitations research.

Problem

Research questions and friction points this paper is trying to address.

Analyzing limitations of Large Language Models (LLMs) like reasoning failures and hallucinations

Tracking trends in LLM limitations research from 2022 to 2024

Identifying key focus areas such as reasoning, generalization, and bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-driven semi-automated review methodology

Keyword filtering and LLM-based classification

HDBSCAN+BERTopic and LlooM topic clustering

🔎 Similar Papers

No similar papers found.