π€ AI Summary
This study addresses the challenge that existing data exploration tools struggle to accurately interpret usersβ analytical intent when expressed in unstructured forms within spatiotemporal datasets. To bridge this gap, the authors propose a multimodal query system integrating freehand sketching, natural language, and visual annotations. Central to their approach is the concept of βproxemic semantics,β which captures how users disambiguate references through the relative spatial arrangement of multimodal elements within a unified interaction space. The system employs a hybrid architecture combining geometric sketch matching with vision-language models (VLMs), enabling joint pattern matching and semantic constraint-based query parsing. A user study with 20 participants empirically validates the stability of proxemic semantics, offering both empirical grounding and design implications for multimodal data exploration interfaces.
π Abstract
Modern data exploration tools often struggle to capture the subtleties of analytical intent, especially when users seek patterns that are difficult to specify using traditional query methods or natural language alone. We introduce a multimodal research probe for querying time-series and geospatial data that integrates free-form sketching, natural language, and visual annotations within a unified interaction space. Users articulate queries by sketching trends or spatial paths and augmenting them with annotations and analytical directives grounded in shared spatial and temporal context. The system employs a hybrid architecture combining geometric sketch matching and visual language models (VLMs) to support queries that interleave pattern matching and semantic constraints. Through a preliminary study with 20 participants, we observed recurring interaction patterns in which participants used spatial, temporal, and visual proximity to relate sketches, annotations, and language. Rather than treating these as isolated inputs, participants relied on their relative placement to disambiguate meaning. We analyze these behaviors as evidence for proximity semantics (PS), a form of deictic disambiguation in which meaning is shaped by the closeness of multimodal elements within a shared interaction space. We present PS as a conceptual lens grounded in observed user behavior, and discuss its implications for the design of future multimodal data exploration systems.