CyberJurors: A Multi-Agent Simulation Task for E-Commerce Disputes Verdict

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study addresses the challenge of adjudicating e-commerce disputes, which requires extracting critical evidence from redundant, multi-turn, and multimodal user submissions and applying platform-specific rules—a task poorly handled by existing approaches. To tackle this, the authors introduce the E-Commerce Dispute Verdict (EDV) task alongside VerdictBench, a new benchmark dataset, and propose CyberJurors, a multi-agent framework that simulates real-world crowdsourced jury decision-making. CyberJurors incorporates structured individual reasoning chains, precedent-based constraints, and a jury consensus voting mechanism. By integrating multimodal large language models, chain-of-thought reasoning, and a precedent-guided group consensus algorithm, the framework significantly outperforms current LLMs, multimodal LLMs, and courtroom simulators on VerdictBench, yielding verdicts that closely align with human jurors’ behavioral patterns.

📝 Abstract

E-commerce platforms have begun recruiting crowdsourced jurors to adjudicate massive volumes of transaction disputes. Unlike formal legal judgment, E-commerce dispute verdicts require grounding pivotal clues from redundant, multi-round, multimodal evidence and making decisions under flexible platform-specific conventions. These characteristics render existing methods insufficient for this scenario. To bridge this gap, we introduce a pioneering task, E-commerce Dispute Verdicts (EDV), and present VerdictBench, a multimodal benchmark comprising 6,000 real-world cases designed to reflect crowdsourced jury decisions. Building upon this, we propose CyberJurors, a multi-agent framework to clarify the dispute logic and regulate the verdict process. At the individual level, Individual Verdict Chain-of-Thought decomposes the EDV task into four structured reasoning stages, enabling fine-grained clue perception and clarifying causal logic between pivotal clues and the dispute focus. At the collective level, Jury Consensus Verdict simulates multi-round discussion and voting among jurors, while incorporating verdict precedents to mitigate cognitive biases toward either disputant. Experiments on VerdictBench show that CyberJurors outperforms state-of-the-art LLMs, MLLMs, and court simulators, while achieving stronger alignment with real-world jury voting patterns. Code and dataset are available at https://github.com/YanhuiS/CyberJurors and https://huggingface.co/datasets/piggi/VerdictBench.

Problem

Research questions and friction points this paper is trying to address.

E-commerce Dispute Verdicts

crowdsourced jurors

multimodal evidence

platform-specific conventions

dispute adjudication

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent simulation

e-commerce dispute verdict

chain-of-thought reasoning