Toward an AI-Native Internet: Rethinking the Web Architecture for Semantic Retrieval

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Current internet architecture is document-centric and optimized for human browsing, rendering it ill-suited for AI-driven fine-grained semantic retrieval—leading to bandwidth waste, degraded information quality, and increased development complexity. This paper proposes the “AI-Native Internet” paradigm, wherein servers natively expose semantically structured information blocks instead of monolithic HTML documents. We design a web-native semantic parsing protocol and a lightweight, large language model–based parser to precisely locate target semantic units prior to retrieval. Leveraging an HTML comparative evaluation framework, we quantitatively demonstrate the inefficiency of conventional webpage parsing for semantic retrieval tasks, and identify key technical directions: semantic chunking, protocol extensibility, and parser lightweighting. Our work establishes both a theoretical foundation and a practical framework for building a next-generation internet infrastructure that is semantics-driven, efficient, and trustworthy.

Technology Category

Application Category

📝 Abstract

The rise of Generative AI Search is fundamentally transforming how users and intelligent systems interact with the Internet. LLMs increasingly act as intermediaries between humans and web information. Yet the web remains optimized for human browsing rather than AI-driven semantic retrieval, resulting in wasted network bandwidth, lower information quality, and unnecessary complexity for developers. We introduce the concept of an AI-Native Internet, a web architecture in which servers expose semantically relevant information chunks rather than full documents, supported by a Web-native semantic resolver that allows AI applications to discover relevant information sources before retrieving fine-grained chunks. Through motivational experiments, we quantify the inefficiencies of current HTML-based retrieval, and outline architectural directions and open challenges for evolving today's document-centric web into an AI-oriented substrate that better supports semantic access to web content.

Problem

Research questions and friction points this paper is trying to address.

The web architecture remains optimized for human browsing rather than AI-driven semantic retrieval.

Current HTML-based retrieval causes wasted bandwidth and lower information quality.

The paper rethinks web architecture to better support semantic access to content.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Servers expose semantic information chunks

Web-native semantic resolver for AI discovery

Architecture shifts from document-centric to AI-oriented

🔎 Similar Papers

No similar papers found.