Judgment-of-Thought Prompting: A Courtroom-Inspired Framework for Binary Logical Reasoning with Large Language Models

๐Ÿ“… 2024-09-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) exhibit limited performance on binary logical reasoning tasks, and existing prompting methods struggle to ensure both reasoning accuracy and interpretability. To address this, we propose Judgment-of-Thought (JoT), a courtroom-inspired multi-agent prompting framework that orchestrates three specialized rolesโ€”Lawyer (generating arguments), Prosecutor (raising counterarguments), and Judge (rendering final verdicts)โ€”to enable structured, iterative debate. JoT innovatively integrates role-based prompting, cross-model feedback fusion, and argument refinement into prompt engineering. This design enhances reasoning robustness and transparency. Evaluated on BigBench-Hard and Winogrande, JoT achieves state-of-the-art performance, attaining 98% accuracy on Boolean expression reasoning. Ablation studies confirm the critical contributions of role specialization and iterative feedback mechanisms to overall efficacy.

Technology Category

Application Category

๐Ÿ“ Abstract
This paper proposes a novel prompting approach, Judgment of Thought (JoT), specifically tailored for binary logical reasoning tasks. Despite advances in prompt engineering, existing approaches still face limitations in handling complex logical reasoning tasks. To address these issues, JoT introduces a multi-agent approach with three specialized roles$unicode{x2010}$$unicode{x2010}$$unicode{x2010}$lawyer, prosecutor, and judge$unicode{x2010}$$unicode{x2010}$$unicode{x2010}$where a high-level model acts as the judge, and lower-level models serve as lawyer and prosecutor to systematically debate and evaluate arguments. Experimental evaluations on benchmarks such as BigBenchHard and Winogrande demonstrate JoT's superior performance compared to existing prompting approaches, achieving notable improvements, including 98% accuracy in Boolean expressions. Also, our ablation studies validate the critical contribution of each role, iterative refinement loops, and feedback mechanisms. Consequently, JoT significantly enhances accuracy, reliability, and consistency in binary reasoning tasks and shows potential for practical applications.
Problem

Research questions and friction points this paper is trying to address.

Enhances binary logical reasoning in LLMs
Addresses limitations in complex logical tasks
Improves accuracy and reliability in reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent roles for logical reasoning
Iterative refinement loops enhance accuracy
Feedback mechanisms improve reliability
๐Ÿ”Ž Similar Papers
No similar papers found.