🤖 AI Summary
This study addresses the lack of domain-specific evaluation benchmarks for in-vehicle voice assistants tailored to real-world deployment scenarios, particularly highlighting critical gaps in Korean localization and fine-grained sociolinguistic control—such as honorifics management. To bridge this gap, the work proposes the first localized evaluation framework that integrates Korean honorific recognition with strategic dialog behaviors, including clarification and proactiveness, adopting a reliability-first conservative assessment strategy. Leveraging large language model–driven behavioral analysis and metric design, the study reveals current models’ instability in honorific control and insufficient strategic competence. Empirical validation demonstrates the framework’s effectiveness in enhancing linguistic precision and interaction safety, thereby advancing in-vehicle assistants from generic capabilities toward nuanced, context-aware language adaptation.
📝 Abstract
While Large Language Models (LLMs) are increasingly integrated into in-vehicle conversational systems, identifying the optimal model remains challenging due to the lack of domain-specific evaluation standards tailored to real-world deployment requirements. In this paper, we propose a novel evaluation framework for in-vehicle assistants, with a particular focus on Korean-language localization. Our empirical analysis reveals notable patterns in model behavior. First, fine-grained Korean honorific control remains unstable in current LLMs, indicating that precise speech-level realization must be explicitly evaluated in localization settings. Second, models exhibit weaker performance in strategic conversational metrics like clarification and proactivity. Our analysis suggests this stems from the inherent subjective complexity of these tasks, where our framework adopts a conservative evaluation stance to prioritize reliability. Together, our findings underscore that automotive AI must move beyond general competence toward precise linguistic tailoring and reliable, safety-oriented interaction management.