Teachers' Perceived Benefits and Risks of AI Across Fifty-Five Countries: An Audit of LLM Alignment and Steerability

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
This study addresses the lack of representative evidence on teachers’ perceptions of AI worldwide and the insufficient validation of large language models (LLMs) in educational contexts. Leveraging OECD TALIS teacher survey data from 55 countries and territories alongside eight mainstream LLMs—including Gemini 1.5 Flash—it systematically evaluates whether LLM-generated responses accurately reflect teachers’ real-world views on the benefits and risks of AI. The assessment employs both generic and country-specific prompts, as well as high- and low-reasoning modes. Findings indicate that LLMs generally fail to reproduce cross-national variations in teacher perceptions, with limited gains from identity prompting or enhanced reasoning. However, certain models capture inter-country ranking trends, supporting their use in exploratory comparative analyses and thereby delineating the boundaries of LLMs’ utility in educational policy research.
📝 Abstract
Teachers' trust in artificial intelligence (AI) in education depends on how they balance its perceived benefits and risks. Yet global discussions about scaling AI in education rely on fragmented evidence, as most studies of teachers' perceptions focus on single countries or small samples. This lack of representative cross-national evidence limits both theory building and policy development. At the same time, large language models (LLMs) are increasingly used in research, policy, and teachers' professional workflows, despite limited validation in education. To address these gaps, we conduct a large-scale audit of LLM alignment with teachers' perceptions of AI by combining representative international survey data with systematic model evaluation. Using OECD TALIS data from 55 countries and territories, we measure cross-national variation in teachers' perceived benefits and risks of AI. We then benchmark responses from eight state-of-the-art LLMs across four providers under both general and country-specific prompting, comparing higher- and lower-reasoning models. Results reveal substantial cross-national variation in teacher perceptions that is not reliably reflected in LLM outputs. Models compress country differences, overestimate both benefits and risks, and show limited gains from identity prompting or enhanced reasoning. This misalignment matters because LLM-generated guidance and professional discourse increasingly shape how teachers learn about and discuss AI, potentially influencing trust and future adoption decisions. Our findings caution against treating LLM outputs as substitutes for direct engagement with teachers when informing global AI-in-education initiatives. At the same time, some models (e.g., Gemini 3 Fast) partially capture cross-national ranking patterns, suggesting a complementary role in hypothesis generation and exploratory comparative analysis.
Problem

Research questions and friction points this paper is trying to address.

teacher perception
AI in education
large language models
cross-national variation
LLM alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM alignment
cross-national perception
AI in education
teacher trust
model audit