Evaluating the Use of LLMs for Automated DOM-Level Resolution of Web Performance Issues

📅 2026-01-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This study addresses the challenge of web performance optimization, which often requires complex DOM restructuring that developers struggle to perform efficiently under resource constraints. It presents the first systematic evaluation of nine state-of-the-art large language models (LLMs) in automatically generating DOM-level fixes for web performance issues, guided by real-world DOM structures and Lighthouse performance audit reports. Results show that LLMs excel at resolving SEO and accessibility problems, with GPT-4.1 reducing audit issues by 46.52%–48.68% on average across initial loading, interactivity, and network optimization categories. However, some models exhibit limited effectiveness, and certain repairs risk introducing visual stability regressions. The work reveals distinct strategic differences and inherent limitations of LLMs across various performance dimensions.

Technology Category

Application Category

📝 Abstract

Users demand fast, seamless webpage experiences, yet developers often struggle to meet these expectations within tight constraints. Performance optimization, while critical, is a time-consuming and often manual process. One of the most complex tasks in this domain is modifying the Document Object Model (DOM), which is why this study focuses on it. Recent advances in Large Language Models (LLMs) offer a promising avenue to automate this complex task, potentially transforming how developers address web performance issues. This study evaluates the effectiveness of nine state-of-the-art LLMs for automated web performance issue resolution. For this purpose, we first extracted the DOM trees of 15 popular webpages (e.g., Facebook), and then we used Lighthouse to retrieve their performance audit reports. Subsequently, we passed the extracted DOM trees and corresponding audits to each model for resolution. Our study considers 7 unique audit categories, revealing that LLMs universally excel at SEO&Accessibility issues. However, their efficacy in performance-critical DOM manipulations is mixed. While high-performing models like GPT-4.1 delivered significant reductions in areas like Initial Load, Interactivity, and Network Optimization (e.g., 46.52% to 48.68% audit incidence reductions), others, such as GPT-4o-mini, notably underperformed, consistently. A further analysis of these modifications showed a predominant additive strategy and frequent positional changes, alongside regressions particularly impacting Visual Stability.

Problem

Research questions and friction points this paper is trying to address.

LLMs

DOM manipulation

web performance

automated resolution

performance optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

DOM manipulation

Web performance optimization