🤖 AI Summary
Existing seller agents in online secondhand markets struggle to track buyers’ cumulative intent over extended negotiations, and no evaluation framework exists for multi-turn bargaining in e-commerce settings. Method: (1) We release a large-scale e-commerce bargaining benchmark comprising 622 categories, 9,892 items, and 3,014 negotiation tasks; (2) we propose a theory-of-mind–inspired, turn-level evaluation framework; and (3) we design an end-to-end pipeline that automatically extracts high-confidence buyer intent. Contribution/Results: Our work departs from conventional coarse-grained evaluation—reliant solely on final transaction outcomes—and enables fine-grained, quantitative assessment of intent recognition accuracy, negotiation dynamics, and process interpretability. This significantly enhances seller agents’ long-horizon intent modeling capability and bargaining effectiveness.
📝 Abstract
In online second-hand marketplaces, multi-turn bargaining is a crucial part of seller-buyer interactions. Large Language Models (LLMs) can act as seller agents, negotiating with buyers on behalf of sellers under given business constraints. A critical ability for such agents is to track and accurately interpret cumulative buyer intents across long negotiations, which directly impacts bargaining effectiveness. We introduce a multi-turn evaluation framework for measuring the bargaining ability of seller agents in e-commerce dialogues. The framework tests whether an agent can extract and track buyer intents. Our contributions are: (1) a large-scale e-commerce bargaining benchmark spanning 622 categories, 9,892 products, and 3,014 tasks; (2) a turn-level evaluation framework grounded in Theory of Mind (ToM) with annotated buyer intents, moving beyond outcome-only metrics; and (3) an automated pipeline that extracts reliable intent from massive dialogue data.