Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
E-commerce websites commonly embed unfavorable financial clauses in their terms of service—provisions that may impose substantial financial harm on users—yet no systematic identification framework or publicly available dataset exists to support large-scale analysis. Method: We introduce the first fine-grained taxonomy of unfavorable financial clauses, comprising four high-level categories and 22 specific clause types; construct ShopTC-100K, a large-scale structured dataset covering 8,251 e-commerce sites; and propose TermMiner, an end-to-end clause mining pipeline integrating web crawling, topic modeling, and expert annotation. We further release TermLens, a reproducible detection tool based on fine-tuned GPT-4o, achieving 94.6% F1-score and only 2.3% false positive rate. Contribution/Results: Empirical analysis reveals that 42.06% of mainstream e-commerce sites contain such clauses, with risk exhibiting a pronounced long-tail distribution—niche sites pose significantly higher risks. Our work establishes foundational data, methodology, and empirical evidence for consumer protection and regulatory oversight.

Technology Category

Application Category

📝 Abstract
Terms and conditions for online shopping websites often contain terms that can have significant financial consequences for customers. Despite their impact, there is currently no comprehensive understanding of the types and potential risks associated with unfavorable financial terms. Furthermore, there are no publicly available detection systems or datasets to systematically identify or mitigate these terms. In this paper, we take the first steps toward solving this problem with three key contributions. extit{First}, we introduce extit{TermMiner}, an automated data collection and topic modeling pipeline to understand the landscape of unfavorable financial terms. extit{Second}, we create extit{ShopTC-100K}, a dataset of terms and conditions from shopping websites in the Tranco top 100K list, comprising 1.8 million terms from 8,251 websites. Consequently, we develop a taxonomy of 22 types from 4 categories of unfavorable financial terms -- spanning purchase, post-purchase, account termination, and legal aspects. extit{Third}, we build extit{TermLens}, an automated detector that uses Large Language Models (LLMs) to identify unfavorable financial terms. Fine-tuned on an annotated dataset, extit{TermLens} achieves an F1 score of 94.6% and a false positive rate of 2.3% using GPT-4o. When applied to shopping websites from the Tranco top 100K, we find that 42.06% of these sites contain at least one unfavorable financial term, with such terms being more prevalent on less popular websites. Case studies further highlight the financial risks and customer dissatisfaction associated with unfavorable financial terms, as well as the limitations of existing ecosystem defenses.
Problem

Research questions and friction points this paper is trying to address.

Identifies unfavorable financial terms in online shopping.
Develops automated tools to detect financial risks.
Analyzes prevalence of harmful terms in top websites.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated data collection and topic modeling pipeline
Large dataset of shopping website terms and conditions
Large Language Model-based automated detector
🔎 Similar Papers
No similar papers found.