Talk Less, Verify More: Improving LLM Assistants with Semantic Checks and Execution Feedback

📅 2026-01-01
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of semantic drift and non-executable outputs in enterprise-grade large language model (LLM) assistants for code generation and business analytics, which typically rely on manual validation due to the absence of built-in verification mechanisms. To overcome this limitation, the authors propose two novel automated validation frameworks—Q* and Feedback+—that, for the first time, integrate reverse-translation semantic matching and code execution feedback loops into a conversational analytics system. By adopting a generator–discriminator architecture, the approach enables a paradigm shift from user-dependent validation to system-level self-verification. Experimental results on the Spider, Bird, and GSM8K benchmarks demonstrate significant reductions in error rates and task completion time, confirming the effectiveness and practical utility of the proposed methods.

Technology Category

Application Category

📝 Abstract
As large language model (LLM) assistants become increasingly integrated into enterprise workflows, their ability to generate accurate, semantically aligned, and executable outputs is critical. However, current conversational business analytics (CBA) systems often lack built-in verification mechanisms, leaving users to manually validate potentially flawed results. This paper introduces two complementary verification techniques: Q*, which performs reverse translation and semantic matching between code and user intent, and Feedback+, which incorporates execution feedback to guide code refinement. Embedded within a generator-discriminator framework, these mechanisms shift validation responsibilities from users to the system. Evaluations on three benchmark datasets, Spider, Bird, and GSM8K, demonstrate that both Q* and Feedback+ reduce error rates and task completion time. The study also identifies reverse translation as a key bottleneck, highlighting opportunities for future improvement. Overall, this work contributes a design-oriented framework for building more reliable, enterprise-grade GenAI systems capable of trustworthy decision support.
Problem

Research questions and friction points this paper is trying to address.

verification
large language models
conversational business analytics
semantic alignment
executable outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic verification
execution feedback
reverse translation
generator-discriminator framework
enterprise-grade GenAI
🔎 Similar Papers
No similar papers found.
Y
Y. Sun
Department of Information Systems and Analytics, National University of Singapore
M
Ming Cai
Department of Information Systems and Analytics, National University of Singapore
Stanley Kok
Stanley Kok
National University of Singapore
Artificial IntelligenceMachine LearningInformation Systems