🤖 AI Summary
This work addresses the challenge of pricing language data as a managed asset under uncertainty about privacy constraints or access costs, which exposes platforms to revenue risk. The authors propose a trimmed robust pricing framework featuring a harmless information-acquisition gate that triggers cost verification only when the expected decision value of refined cost signals exceeds that of a non-verification policy, thereby guaranteeing a safe net revenue. Integrating online learning, risk-aware decision-making, and value-of-information analysis, the approach leverages causal ablation studies to guide policy selection. Experiments on synthetic, real-agent, and downstream utility benchmarks demonstrate that the method matches or outperforms existing baselines, with performance gains primarily attributable to intelligent pricing rather than frequent verification. Notably, optimal policies in practical settings often favor non-verification, underscoring the efficacy of calibrating prices first and verifying sparingly thereafter.
📝 Abstract
Language data are increasingly acquired and governed as assets, yet platforms often price candidate resources before knowing their true privacy or access costs. We study online pricing for governed language data assets under cost uncertainty. At each round, a platform observes an NLP task, a candidate asset, and a coarse cost estimate, may pay for a refined cost signal, posts a price, and receives safe net revenue.
We introduce \textsc{NH-CROP}, a clipped robust pricing framework with a no-harm information-acquisition gate. The method compares direct pricing, risk-aware pricing, and verify-then-price, and acquires information only when its estimated decision value exceeds the best no-verification alternative. Across synthetic, real-proxy, and downstream-utility-grounded benchmarks, clipped \textsc{NH-CROP} variants improve or remain competitive with price-only and risk-aware baselines. Causal ablations show that paid verification is not the main source of gains in real-proxy and utility-grounded settings: the strongest learned policies often choose not to verify. Oracle and high-decision-value diagnostics show that refined cost information can still have substantial local value. Overall, governed language-data platforms should calibrate pricing under uncertain access costs first and verify only when information is cheap and decision-actionable.