🤖 AI Summary
Existing data discovery systems rely on keyword matching, resulting in low recall due to lexical and terminological variation. Method: To address this, we investigate large language model (LLM)-driven semantic retrieval as an alternative, focusing on researchers’ actual acceptance—a critical gap in current AI adoption research. Employing a human-centered AI design paradigm, we conducted iterative focus groups (N=27) and qualitative modeling to develop a behavioral acceptance model. Contribution/Results: We find that LLMs’ semantic capabilities alone do not suffice to drive adoption; rather, system transparency—encompassing explainable reasoning, traceable results, and negotiable user control—emerges as the central mechanism shaping trust and usage intention. This work introduces the first theoretically grounded, empirically validated LLM acceptance framework tailored specifically to scholarly data discovery, offering both conceptual insights and actionable design principles for trustworthy AI systems in research contexts.
📝 Abstract
Current approaches to data discovery match keywords between metadata and queries. This matching requires researchers to know the exact wording that other researchers previously used, creating a challenging process that could lead to missing relevant data. Large Language Models (LLMs) could enhance data discovery by removing this requirement and allowing researchers to ask questions with natural language. However, we do not currently know if researchers would accept LLMs for data discovery. Using a human-centered artificial intelligence (HCAI) focus, we ran focus groups (N = 27) to understand researchers' perspectives towards LLMs for data discovery. Our conceptual model shows that the potential benefits are not enough for researchers to use LLMs instead of current technology. Barriers prevent researchers from fully accepting LLMs, but features around transparency could overcome them. Using our model will allow developers to incorporate features that result in an increased acceptance of LLMs for data discovery.