🤖 AI Summary
This study addresses the challenge posed by AI systems directly consuming content, for which existing publishers lack scalable dynamic pricing mechanisms suitable for heterogeneous content. The authors propose LM-Tree, an agent that integrates large language models (LLMs) with an adaptive segment tree to automatically identify high-value content features solely from binary purchase feedback—without requiring predefined categories or explicit price labels. This approach enables fine-grained, scalable pay-per-crawl pricing and achieves large-scale content valuation in the absence of explicit pricing signals. Experiments on real-world data from a German technology media outlet demonstrate that LM-Tree increases revenue by 65%, 47%, and 40% compared to uniform pricing, two-tier pricing, and the publisher’s custom eight-category scheme, respectively.
📝 Abstract
As AI systems shift from directing users to content toward consuming it directly, publishers need a new revenue model: charging AI crawlers for content access. This model, called pay-per-crawl, must solve a problem of mechanism selection at scale: content is too heterogeneous for a fixed pricing framework. Different sub-types warrant not only different price levels but different pricing rules based on different unstructured features, and there are too many to enumerate or design by hand. We propose the LM Tree, an adaptive pricing agent that grows a segmentation tree over the content library, using LLMs to discover what distinguishes high-value from low-value items and apply those attributes at scale, from binary purchase feedback alone. We evaluate the LM Tree on real content from a major German technology publisher, using 8,939 articles and 80,451 buyer queries with willingness-to-pay calibrated from actual AI crawler traffic. The LM Tree achieves a 65% revenue gain over a single static price and a 47% gain over two-category pricing, outperforming even the publisher's own 8-segment editorial taxonomy by 40% -- recovering content distinctions the publisher's own categories miss.