🤖 AI Summary
Traditional planar-screen experiments inadequately capture the cognitive dynamics of visual enumeration in ecologically valid, large-field-of-view environments, particularly overlooking interactions between attention and working memory in complex scenes.
Method: Leveraging immersive virtual reality and high-precision eye-tracking, we employed a two-phase enumeration paradigm, orthogonally manipulating task intention (counting vs. identity recognition) and spatial layout across two stimulus types: real-world objects and simple geometric shapes.
Contribution/Results: Semantic processing imposes substantial cognitive load that significantly suppresses memory retrieval—constituting a primary bottleneck in real-world enumeration. Moreover, selective counting itself incurs substantial cognitive cost, which escalates with stimulus complexity. This study transcends conventional focus on visual search alone, providing the first empirical evidence of how semantic–memory interactions constrain enumeration efficiency. These findings offer critical constraints for developing embodied models of numerical cognition.
📝 Abstract
Humans navigate and understand complex visual environments by subconsciously quantifying what they see, a process known as visual enumeration. However, traditional studies using flat screens fail to capture the cognitive dynamics of this process over the large visual fields of real-world scenes. To address this gap, we developed an immersive virtual reality system with integrated eye-tracking to investigate the interplay between attention and memory during complex enumeration. We conducted a two-phase experiment where participants enumerated scenes of either simple abstract shapes or complex real-world objects, systematically varying the task intent (e.g., selective vs. exhaustive counting) and the spatial layout of items. Our results reveal that task intent is the dominant factor driving performance, with selective counting imposing a significant cognitive cost that was dramatically amplified by stimulus complexity. The semantic processing required for real-world objects reduced accuracy and suppressed memory recall, while the influence of spatial layout was secondary and statistically non-significant when a higher-order cognitive task intent was driving the human behaviour. We conclude that real-world enumeration is fundamentally constrained by the cognitive load of semantic processing, not just the mechanics of visual search. Our findings demonstrate that under high cognitive demand, the effort to understand what we are seeing directly limits our capacity to remember it.