🤖 AI Summary
Existing 2.5D content creation suffers from complex depth perception, hindering efficient generation of realistic occlusion and perspective distortion. This paper proposes a human-in-the-loop 2.5D design framework: first, joint monocular depth estimation and semantic segmentation reconstruct scene geometry, while a vision-language model interprets image semantics to generate editable content anchors; second, intuitive 3D element placement and interactive spatial editing are enabled within a 2D viewport, with automatic synthesis of physically plausible occlusion and perspective foreshortening. By integrating multimodal perception and geometric reasoning, the method significantly lowers the barrier for non-expert users. A user study (N=100 professional images) and expert evaluation demonstrate robust performance and high output fidelity. The framework establishes a novel paradigm for lightweight, high-fidelity 2.5D content generation.
📝 Abstract
2.5D effects, such as occlusion and perspective foreshortening, enhance visual dynamics and realism by incorporating 3D depth cues into 2D designs. However, creating such effects remains challenging and labor-intensive due to the complexity of depth perception. We introduce DepthScape, a human-AI collaborative system that facilitates 2.5D effect creation by directly placing design elements into 3D reconstructions. Using monocular depth reconstruction, DepthScape transforms images into 3D reconstructions where visual contents are placed to automatically achieve realistic occlusion and perspective foreshortening. To further simplify 3D placement through a 2D viewport, DepthScape uses a vision-language model to analyze source images and extract key visual components as content anchors for direct manipulation editing. We evaluate DepthScape with nine participants of varying design backgrounds, confirming the effectiveness of our creation pipeline. We also test on 100 professional stock images to assess robustness, and conduct an expert evaluation that confirms the quality of DepthScape's results.