Snippet-Driven Supply Chain Discovery with LLMs: Scaling Visibility in China

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This study addresses the severe scarcity of structured data on supply chain relationships in China, particularly involving non-listed and long-tail firms. To overcome this limitation, the authors propose a lightweight evidence-based framework that leverages search engine text snippets in conjunction with large language models to efficiently extract enterprise-level supplier–customer relationships. The approach constructs an auditable and traceable supply chain knowledge graph while significantly reducing computational overhead. Remarkably, it achieves 7.2 times greater enterprise coverage and 9.3 times more relationship coverage compared to conventional databases, at only 1/251 of the input cost, all while maintaining low redundancy and high traceability.

📝 Abstract

Financial and economic research often relies on structured supply-chain disclosures and commercial databases. In China, supplier--customer disclosure is typically limited to major partners of listed firms, leaving unlisted firms and long-tail inter-firm links poorly captured in structured data. Public web evidence can partly complement this gap through corporate, government, and trade-media disclosures; however, full-text web mining at scale is costly because pages are often inaccessible or expensive to process with large language models (LLMs). We propose a snippet-driven method for constructing a supply chain knowledge graph (SCKG), with firms as nodes and inter-firm relationships as edges. Web search snippets are query-biased summaries returned with search results. We use them as a scalable first-pass evidence layer for LLM-based relationship extraction. We evaluate the pipeline in terms of extraction efficiency and coverage. For extraction efficiency, exhaustive full-text chunking discovers 19.8$\times$ more unique relationships than snippets, but requires 251.2$\times$ more input tokens and yields higher redundancy. For coverage, we use 130,685 Chinese firms as search seeds, covering Shanghai/Shenzhen-listed firms and large unlisted firms as of 2024. In the listed-firm subset, the resulting SCKG covers 7.2$\times$ more firms and 9.3$\times$ more relationships than the CSMAR disclosure-based benchmark, while revealing heavy-tailed degree patterns. Retained provenance metadata make the SCKG an auditable complement to disclosure-based databases.

Problem

Research questions and friction points this paper is trying to address.

supply chain discovery

China

structured data gap

web mining scalability

inter-firm relationships

Innovation

Methods, ideas, or system contributions that make the work stand out.

snippet-driven

supply chain knowledge graph

large language models