🤖 AI Summary
This study addresses the lack of validated instruments for measuring human trust in large language models (LLMs) within conversational contexts. To bridge this gap, we propose TILLMI—the first cognitive–affective, two-dimensional trust measurement framework tailored to LLM interactions—grounded in McAllister’s trust theory and adapted to the LLM context. We introduce a novel paradigm for validating LLM simulation validity and, through a large-scale empirical survey (N = 1,000), establish a parsimonious six-item scale with two interpretable factors: “closeness” and “reliance.” Rigorous psychometric evaluation—including exploratory and confirmatory factor analysis (EFA/CFA), alongside assessments of personality (BFI) and cognitive flexibility—confirms TILLMI’s excellent reliability and validity (CFI = .995, RMSEA = .046). Results further reveal significant moderating effects of age, gender, and LLM usage experience on trust levels, and robust associations: positive correlations with openness and extraversion, and a negative correlation with neuroticism.
📝 Abstract
Large Language Models (LLMs) can engage in human-looking conversational exchanges. Although conversations can elicit trust between users and LLMs, scarce empirical research has examined trust formation in human-LLM contexts, beyond LLMs' trustworthiness or human trust in AI in general. Here, we introduce the Trust-In-LLMs Index (TILLMI) as a new framework to measure individuals' trust in LLMs, extending McAllister's cognitive and affective trust dimensions to LLM-human interactions. We developed TILLMI as a psychometric scale, prototyped with a novel protocol we called LLM-simulated validity. The LLM-based scale was then validated in a sample of 1,000 US respondents. Exploratory Factor Analysis identified a two-factor structure. Two items were then removed due to redundancy, yielding a final 6-item scale with a 2-factor structure. Confirmatory Factor Analysis on a separate subsample showed strong model fit ($CFI = .995$, $TLI = .991$, $RMSEA = .046$, $p_{X^2}>.05$). Convergent validity analysis revealed that trust in LLMs correlated positively with openness to experience, extraversion, and cognitive flexibility, but negatively with neuroticism. Based on these findings, we interpreted TILLMI's factors as"closeness with LLMs"(affective dimension) and"reliance on LLMs"(cognitive dimension). Younger males exhibited higher closeness with- and reliance on LLMs compared to older women. Individuals with no direct experience with LLMs exhibited lower levels of trust compared to LLMs' users. These findings offer a novel empirical foundation for measuring trust in AI-driven verbal communication, informing responsible design, and fostering balanced human-AI collaboration.