🤖 AI Summary
Traditional survey-based urban safety perception assessment suffers from high costs and poor scalability. Method: This study proposes a zero-shot paradigm leveraging large multimodal models (LMMs), specifically LLaVA-1.6-7B, augmented with a novel *Persona prompting* mechanism to simulate visual judgments of diverse demographic groups—varying by age, gender, and nationality—without fine-tuning, enabling binary safe/unsafe classification of street-view imagery. Contribution/Results: We identify an inherent bias toward middle-aged male perspectives in the base model; achieve a zero-shot average F1-score of 59.21%; uncover isolation, physical deterioration, and infrastructure deficiencies as key drivers of perceived unsafety; and observe substantial cross-national variation in unsafe classification rates (19.71%–40.15%). To our knowledge, this is the first work to explicitly integrate sociodemographic perspectives into LMM inference, establishing a new methodological foundation for fairness-aware, AI-driven urban perception research.
📝 Abstract
Understanding how urban environments are perceived in terms of safety is crucial for urban planning and policymaking. Traditional methods like surveys are limited by high cost, required time, and scalability issues. To overcome these challenges, this study introduces Large Multimodal Models (LMMs), specifically Llava 1.6 7B, as a novel approach to assess safety perceptions of urban spaces using street-view images. In addition, the research investigated how this task is affected by different socio-demographic perspectives, simulated by the model through Persona-based prompts. Without additional fine-tuning, the model achieved an average F1-score of 59.21% in classifying urban scenarios as safe or unsafe, identifying three key drivers of perceived unsafety: isolation, physical decay, and urban infrastructural challenges. Moreover, incorporating Persona-based prompts revealed significant variations in safety perceptions across the socio-demographic groups of age, gender, and nationality. Elder and female Personas consistently perceive higher levels of unsafety than younger or male Personas. Similarly, nationality-specific differences were evident in the proportion of unsafe classifications ranging from 19.71% in Singapore to 40.15% in Botswana. Notably, the model's default configuration aligned most closely with a middle-aged, male Persona. These findings highlight the potential of LMMs as a scalable and cost-effective alternative to traditional methods for urban safety perceptions. While the sensitivity of these models to socio-demographic factors underscores the need for thoughtful deployment, their ability to provide nuanced perspectives makes them a promising tool for AI-driven urban planning.