From keywords to semantics: Perceptions of large language models in data discovery

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing data discovery systems rely on keyword matching, resulting in low recall due to lexical and terminological variation. Method: To address this, we investigate large language model (LLM)-driven semantic retrieval as an alternative, focusing on researchers’ actual acceptance—a critical gap in current AI adoption research. Employing a human-centered AI design paradigm, we conducted iterative focus groups (N=27) and qualitative modeling to develop a behavioral acceptance model. Contribution/Results: We find that LLMs’ semantic capabilities alone do not suffice to drive adoption; rather, system transparency—encompassing explainable reasoning, traceable results, and negotiable user control—emerges as the central mechanism shaping trust and usage intention. This work introduces the first theoretically grounded, empirically validated LLM acceptance framework tailored specifically to scholarly data discovery, offering both conceptual insights and actionable design principles for trustworthy AI systems in research contexts.

Technology Category

Application Category

📝 Abstract

Current approaches to data discovery match keywords between metadata and queries. This matching requires researchers to know the exact wording that other researchers previously used, creating a challenging process that could lead to missing relevant data. Large Language Models (LLMs) could enhance data discovery by removing this requirement and allowing researchers to ask questions with natural language. However, we do not currently know if researchers would accept LLMs for data discovery. Using a human-centered artificial intelligence (HCAI) focus, we ran focus groups (N = 27) to understand researchers' perspectives towards LLMs for data discovery. Our conceptual model shows that the potential benefits are not enough for researchers to use LLMs instead of current technology. Barriers prevent researchers from fully accepting LLMs, but features around transparency could overcome them. Using our model will allow developers to incorporate features that result in an increased acceptance of LLMs for data discovery.

Problem

Research questions and friction points this paper is trying to address.

Current data discovery requires exact keyword matching between queries and metadata

Researchers hesitate to adopt LLMs due to transparency and trust barriers

Developing transparent LLM features could increase acceptance for semantic data discovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs for semantic data discovery

Applying human-centered AI focus groups

Developing transparency features for LLM acceptance

🔎 Similar Papers

No similar papers found.