๐ค AI Summary
This study addresses the lack of systematic understanding regarding how visual attributes of web pages influence decision-making behaviors of Web agents in non-adversarial settings. To bridge this gap, the authors propose the Visual Attribute Framework (VAF), which generates semantically consistent webpage variants with controlled visual modifications. By integrating human browsing behavior simulation and agent interaction experiments, VAF enables the first systematic quantification of the impact of multiple visual factors. The framework combines variant generation, interaction simulation, and dual evaluation metricsโTarget Click Rate and Target Mention Rate. Experiments across eight visual attribute categories, five real-world websites, and four state-of-the-art Web agents reveal that background contrast, element size, spatial position, and card clarity significantly affect agent behavior, whereas font style, text color, and image clarity exhibit comparatively weaker influence.
๐ Abstract
Web agents have demonstrated strong performance on a wide range of web-based tasks. However, existing research on the effect of environmental variation has mostly focused on robustness to adversarial attacks, with less attention to agents'preferences in benign scenarios. Although early studies have examined how textual attributes influence agent behavior, a systematic understanding of how visual attributes shape agent decision-making remains limited. To address this, we introduce VAF, a controlled evaluation pipeline for quantifying how webpage Visual Attribute Factors influence web-agent decision-making. Specifically, VAF consists of three stages: (i) variant generation, which ensures the variants share identical semantics as the original item while only differ in visual attributes; (ii) browsing interaction, where agents navigate the page via scrolling and clicking the interested item, mirroring how human users browse online; (iii) validating through both click action and reasoning from agents, which we use the Target Click Rate and Target Mention Rate to jointly evaluate the effect of visual attributes. By quantitatively measuring the decision-making difference between the original and variant, we identify which visual attributes influence agents'behavior most. Extensive experiments, across 8 variant families (48 variants total), 5 real-world websites (including shopping, travel, and news browsing), and 4 representative web agents, show that background color contrast, item size, position, and card clarity have a strong influence on agents'actions, whereas font styling, text color, and item image clarity exhibit minor effects.