🤖 AI Summary
Policy Compliance Detection (PCD) in conversational question answering—determining whether a user’s request adheres to a given policy and deciding whether to seek clarification—faces significant challenges under zero- or few-shot settings due to scarce labeled data. Method: We propose LDPC, a Logic Decomposition–driven PCD framework that explicitly compiles natural-language policies into interpretable symbolic logic graphs. LDPC synergistically integrates large language models (LLMs) for few-shot prompting, logical subproblem extraction, and context-aware truth assignment, enabling neuro-symbolic reasoning and precise error tracing. Contribution/Results: Evaluated on the ShARC benchmark, LDPC achieves competitive performance without fine-tuning, markedly enhancing decision transparency and interpretability. Moreover, it uncovers inherent policy ambiguities and reasoning bottlenecks embedded in the dataset, offering new insights into policy-grounded dialogue systems.
📝 Abstract
The task of policy compliance detection (PCD) is to determine if a scenario is in compliance with respect to a set of written policies. In a conversational setting, the results of PCD can indicate if clarifying questions must be asked to determine compliance status. Existing approaches usually claim to have reasoning capabilities that are latent or require a large amount of annotated data. In this work, we propose logical decomposition for policy compliance (LDPC): a neuro-symbolic framework to detect policy compliance using large language models (LLMs) in a few-shot setting. By selecting only a few exemplars alongside recently developed prompting techniques, we demonstrate that our approach soundly reasons about policy compliance conversations by extracting sub-questions to be answered, assigning truth values from contextual information, and explicitly producing a set of logic statements from the given policies. The formulation of explicit logic graphs can in turn help answer PCDrelated questions with increased transparency and explainability. We apply this approach to the popular PCD and conversational machine reading benchmark, ShARC, and show competitive performance with no task-specific finetuning. We also leverage the inherently interpretable architecture of LDPC to understand where errors occur, revealing ambiguities in the ShARC dataset and highlighting the challenges involved with reasoning for conversational question answering.