🤖 AI Summary
This study addresses the challenge of identifying specific local visual factors that influence human subjective judgments—such as perceived safety—in street-view imagery, which existing urban perception models struggle to reveal. The work proposes a novel counterfactual framework that formalizes urban perception attribution as a controlled local intervention task, leveraging “visual levers” defined by semantic concepts, spatial regions, intervention directions, and constraint templates to enable scene-level interpretability through prompt-driven image generation. The approach incorporates a multidimensional validation mechanism assessing co-locality, locality, photorealism, and plausibility, complemented by human pairwise evaluations. Applied across 50 street-view sites spanning multiple cities, the method identifies mobility infrastructure and physical maintenance edits as most significantly impacting perceived safety and establishes a taxonomy of failure modes in prompt-based editing.
📝 Abstract
Street-view perception models predict subjective attributes such as safety at scale, but remain correlational: they do not identify which localized visual changes would plausibly shift human judgement for a specific scene. We propose a lever-based interventional counterfactual framework that recasts scene-level explainability as a bounded search over structured counterfactual edits. Each lever specifies a semantic concept, spatial support, intervention direction, and constrained edit template. Candidate edits are generated through prompt-conditioned image editing and retained only if they satisfy validity checks for same-place preservation, locality, realism, and plausibility. In a pilot across 50 scenes from five cities, the framework reveals preliminary proxy-based directional patterns and a practical failure taxonomy under prompt-only editing, with Mobility Infrastructure and Physical Maintenance showing the largest auxiliary safety shifts. Human pairwise judgements remain the ground-truth endpoint for future validation.