CyberJurors: A Multi-Agent Simulation Task for E-Commerce Disputes Verdict

๐Ÿ“… 2026-05-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the challenge of adjudicating e-commerce disputes, which requires extracting critical evidence from redundant, multi-turn, and multimodal user submissions and applying platform-specific rulesโ€”a task poorly handled by existing approaches. To tackle this, the authors introduce the E-Commerce Dispute Verdict (EDV) task alongside VerdictBench, a new benchmark dataset, and propose CyberJurors, a multi-agent framework that simulates real-world crowdsourced jury decision-making. CyberJurors incorporates structured individual reasoning chains, precedent-based constraints, and a jury consensus voting mechanism. By integrating multimodal large language models, chain-of-thought reasoning, and a precedent-guided group consensus algorithm, the framework significantly outperforms current LLMs, multimodal LLMs, and courtroom simulators on VerdictBench, yielding verdicts that closely align with human jurorsโ€™ behavioral patterns.
๐Ÿ“ Abstract
E-commerce platforms have begun recruiting crowdsourced jurors to adjudicate massive volumes of transaction disputes. Unlike formal legal judgment, E-commerce dispute verdicts require grounding pivotal clues from redundant, multi-round, multimodal evidence and making decisions under flexible platform-specific conventions. These characteristics render existing methods insufficient for this scenario. To bridge this gap, we introduce a pioneering task, E-commerce Dispute Verdicts (EDV), and present VerdictBench, a multimodal benchmark comprising 6,000 real-world cases designed to reflect crowdsourced jury decisions. Building upon this, we propose CyberJurors, a multi-agent framework to clarify the dispute logic and regulate the verdict process. At the individual level, Individual Verdict Chain-of-Thought decomposes the EDV task into four structured reasoning stages, enabling fine-grained clue perception and clarifying causal logic between pivotal clues and the dispute focus. At the collective level, Jury Consensus Verdict simulates multi-round discussion and voting among jurors, while incorporating verdict precedents to mitigate cognitive biases toward either disputant. Experiments on VerdictBench show that CyberJurors outperforms state-of-the-art LLMs, MLLMs, and court simulators, while achieving stronger alignment with real-world jury voting patterns. Code and dataset are available at https://github.com/YanhuiS/CyberJurors and https://huggingface.co/datasets/piggi/VerdictBench.
Problem

Research questions and friction points this paper is trying to address.

E-commerce Dispute Verdicts
crowdsourced jurors
multimodal evidence
platform-specific conventions
dispute adjudication
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent simulation
e-commerce dispute verdict
chain-of-thought reasoning
jury consensus
multimodal benchmark
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yanhui Sun
School of Information Science and Technology, University of Science and Technology of China, Hefei, China
W
Wu Liu
School of Information Science and Technology, University of Science and Technology of China, Hefei, China
H
Haifeng Ming
School of Information Science and Technology, University of Science and Technology of China, Hefei, China
Xinru Wang
Xinru Wang
Purdue University
Human-AI interactionexplainable AI
H
Hantao Yao
School of Information Science and Technology, University of Science and Technology of China, Hefei, China
Y
Yongdong Zhang
School of Information Science and Technology, University of Science and Technology of China, Hefei, China